The Byzantine Generals Problem

2020-03-01 73浏览

1.The Byzantine Generals Problem LESLIE LAMPORT, ROBERT SHOSTAK, and MARSHALL PEASE SRI International Reliable computer systems must handle malfunctioningcomponents that give conflicting information to different parts of the system. This situation can be expressed abstractly in terms of a group of generals of the Byzantine army camped with their troops around an enemy city. Communicatingonly by messenger, the generals must agree upon a common battle plan. However, one or more of them may be traitors who will try to confuse the others. The problem is to find an algorithm to ensure that the loyal generals will reach agreement. It is shown that, using only oral messages, this problem is solvable if and only if more than two-thirds of the generals are loyal; so a single traitor can confound two loyal generals. With unforgeable written messages, the problem is solvable for any number of generals and possible traitors. Applications of the solutions to reliable computer systems are then discussed. Categories and SubjectDescriptors:C.2.4. [Computer-Communication Networks]: Distributed Systems--network operating systems; D.4.4 [Operating Systems]: CommunicationsManagement-network communication; D.4.5 [Operating Systems]: Reliability--fault tolerance GeneralTerms:Algorithms, Reliability Additional Key Words andPhrases:Interactive consistency / 1. INTRODUCTION A r e l i a b l e c o m p u t e r s y s t e m m u s t b e a b l e to cope w i t h t h e f a i l u r e of o n e or m o r e of its c o m p o n e n t s . A failed c o m p o n e n t m a y e x h i b i t a t y p e of b e h a v i o r t h a t is o f t e n o v e r l o o k e d - - n a m e l y , s e n d i n g c o n f l i c t i n g i n f o r m a t i o n to d i f f e r e n t p a r t s of t h e s y s t e m . T h e p r o b l e m of c o p i n g w i t h t h i s t y p e of f a i l u r e is e x p r e s s e d a b s t r a c t l y as t h e B y z a n t i n e G e n e r a l s P r o b l e m . W e d e v o t e t h e m a j o r p a r t of t h e p a p e r to a d i s c u s s i o n of t h i s a b s t r a c t p r o b l e m a n d c o n c l u d e b y i n d i c a t i n g h o w o u r s o l u t i o n s can be used in i m p l e m e n t i n g a reliable c o m p u t e r system. W e i m a g i n e t h a t s e v e r a l d i v i s i o n s of t h e B y z a n t i n e a r m y a r e c a m p e d o u t s i d e a n e n e m y city, e a c h d i v i s i o n c o m m a n d e d b y its o w n g e n e r a l . T h e g e n e r a l s c a n communicate with one another only by messenger. After observing the enemy, t h e y m u s t d e c i d e u p o n a c o m m o n p l a n of a c t i o n . H o w e v e r , s o m e of t h e g e n e r a l s This research was supported in part by the National Aeronautics and Space Administration under contract NAS1-15428 Mod. 3, the Ballistic Missile Defense Systems Command under contract DASG60-78-C-0046, and the Army Research Office under contract DAAG29-79-C-0102. Authors'address:Computer Science Laboratory, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1982 ACM 0164-0925/82/0700-0382 $00.75 ACM Transactionson ProgrammingLanguagesand Systems,Vol.4, No. 3, July 1982,Pages 382-401.
2.The Byzantine Generals Problem 383 may be traitors, trying to prevent the loyal generals from reaching agreement. The generals must have an algorithm to guarantee that A. All loyal generals decide upon the same plan of action. The loyal generals will all do what the algorithm says they should, but the traitors may do anything they wish. The algorithm must guarantee condition A regardless of what the traitors do. The loyal generals should not only reach agreement, but should agree upon a reasonable plan. We therefore also want to insure that B. A small number of traitors cannot cause the loyal generals to adopt a bad plan. Condition B is hard to formalize, since it requires saying precisely what a bad plan is, and we do not attempt to do so. Instead, we consider how the generals reach a decision. Each general observes the enemy and communicates his observations to the others. Let v(i) be the information communicated by the ith general. Each general uses some method for combining the values v (1) . . . . . v (n) into a single plan of action, where n is the number of generals. Condition A is achieved by having all generals use the same method for combining the information, and Condition B is achieved by using a robust method. For example, if the only decision to be made is whether to attack or retreat, then v(i) con be General i's opinion of which option is best, and the final decision can be based upon a majority vote among them. A small number of traitors can affect the decision only if the loyal generals were almost equally divided between the two possibilities, in which case neither decision could be called bad. While this approach may not be the only way to satisfy conditions A and B, it is the only one we know of. It assumes a method by which the generals communicate their values v (i) to one another. The obvious method is for the ith general to send v (i) by messenger to each other general. However, this does not work, because satisfying condition A requires that every loyal general obtain the same values v(1) . . . . . v(n), and a traitorous general may send different values to different generals. For condition A to be satisfied, the following must betrue:1. Every loyal general must obtain the same information v (1) . . . . , v (n). Condition 1 implies that a general cannot necessarily use a value of v(i) obtained directly from the ith general, since a traitorous ith general may send different values to different generals. This means that unless we are careful, in meeting condition 1 we might introduce the possibility that the generals use a value of v (i) different from the one sent by the ith general--even though the ith general is loyal. We must not allow this to happen if condition B is to be met. For example, we cannot permit a few traitors to cause the loya~ generals to base their decision upon the values " r e t r e a t " , . . . , "retreat" if every loyal general sent the value "attack". We therefore have the following requirement for eachi:2. If the ith general is loyal, then the value that he sends must be used by every loyal general as the value of v (i). ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.
3.384 L. Lamport, R. Shostak, and M. Pease We can rewrite condition I as the condition that for every i (whether or not the ith general is loyal), 1'. Any two loyal generals use the same value of v(i). Conditions 1' and 2 are both conditions on the single value sent by the ith general. We can therefore restrict our consideration to the problem of how a single general sends his value to the others. We phrase this in terms of a commanding general sending an order to his lieutenants, obtaining the following problem. Byzantine Generals Problem. A commanding general must send an order to his n - 1 lieutenant generals such t hat IC1. All loyal lieutenants obey the same order. IC2. If the commanding general is loyal, then every loyal lieutenant obeys the order he sends. Conditions IC1 and IC2 are called the interactive consistency conditions. Note that if the commander is loyal, then IC1 follows from IC2. However, the commander need not be loyal. To solve our original problem, the ith general sends his value of v(i) by using a solution to the Byzantine Generals Problem to send the order "use v (i) as my value", with the other generals acting as the lieutenants. 2. IMPOSSIBILITY RESULTS T h e Byzantine Generals Problem seems deceptively simple. Its difficulty is indicated by the surprising fact that if the generals can send only oral messages, then no solution will work unless more than two-thirds of the generals are loyal. In particular, with only three generals, no solution can work in the presence of a single traitor. An oral message is one whose contents are completely under the control of the sender, so a traitorous sender can transmit any possible message. Such a message corresponds to the type of message that computers normally send to one another. In Section 4 we consider signed, written messages, for which this is not true. We now show that with oral messages no solution for three generals can handle a single traitor. For simplicity, we consider the case in which the only possible decisions are "attack" or "retreat". Let us first examine the scenario pictured in Figure 1 in which the commander is loyal and sends an "attack" order, but Lieutenant 2 is a traitor and reports to Lieutenant 1 that he received a "retreat" order. For IC2 to be satisfied, Lieutenant 1 must obey the order to attack. Now consider another scenario, shown in Figure 2, in which the commander is a traitor and sends an "attack" order to Lieutenant 1 and a "retreat" order to Lieutenant 2. Lieutenant 1 does not know who the traitor is, and he cannot tell what message the commander actually sent to Lieutenant 2. Hence, the scenarios in these two pictures appear exactly the same to Lieutenant 1. If the traitor lies consistently, then there is no way for Lieutenant 1 to distinguish between these two situations, so he must obey the "attack" order in both of them. Hence, whenever Lieutenant 1 receives an "attack" order from the commander, he must obey it. ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.
4.The Byzantine Generals Problem f ,, 385 t, "he said 'retreat'" Fig. 1. Lieutenant 2 a traitor. y/ "he said 'retreat'" Fig. 2. T h e c o m m a n d e r a traitor. However, a similar argument shows that if Lieutenant 2 receives a "retreat" order from the commander then he must obey it even if Lieutenant 1 tells him that the commander said "attack". Therefore, in the scenario of Figure 2, Lieutenant 2 must obey the "retreat" order while Lieutenant 1 obeys the "attack" order, thereby violating condition IC1. Hence, no solution exists for three generals that works in the presence of a single traitor. This argument may appear convincing, but we strongly advise the reader to be very suspicious of such nonrigorous reasoning. Although this result is indeed correct, we have seen equally plausible "proofs" of invalid results. We know of no area in computer science or mathematics in which informal reasoning is more likely to lead to errors than in the study of this type of algorithm. For a rigorous proof of the impossibility of a three-general solution that can handle a single traitor, we refer the reader to [3]. Using this result, we can show that no solution with fewer than 3m + 1 generals can cope with m traitorsJ T he proof is by contradiction--we assume such a ' More precisely, no such solution exists for three or more generals, since the problem is trivial for two generals. ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.
5.386 L. Lamport, R. Shostak, and M. Pease solution for a group of 3m or fewer and use it to construct a three-general solution to the Byzantine Generals Problem that works with one traitor, which we know to be impossible. To avoid confusion between the two algorithms, we call the generals of the assumed solution Albanian generals, and those of the constructed solution Byzantine generals. Thus, starting from an algorithm that allows 3m or fewer Albanian generals to cope with m traitors, we construct a solution that allows three Byzantine generals to handle a single traitor. The three-general solution is obtained by having each of the Byzantine generals simulate approximately one-third of the Albanian generals, so that each Byzantine general is simulating at most m Albanian generals. The Byzantine commander simulates the Albanian commander plus at most m - 1 Albanian lieutenants, and each of the two Byzantine lieutenants simulates at most m Albanian lieutenants. Since only one Byzantine general can be a traitor, and he simulates at most m Albanians, at most m of the Albanian generals are traitors. Hence, the assumed solution guarantees that IC1 and IC2 hold for the Albanian generals. By IC1, all the Albanian lieutenants being simulated by a loyal Byzantine lieutenant obey the same order, which is the order he is to obey. It is easy to check that conditions IC1 and IC2 of the Albanian generals solution imply the corresponding conditions for the Byzantine generals, so we have constructed the required impossible solution. One might think that the difficulty in solving the Byzantine Generals Problem stems from the requirement of reaching exact agreement. We now demonstrate that this is not the case by showing that reaching approximate agreement is just as hard as reaching exact agreement. Let us assume that instead of trying to agree on a precise battle plan, the generals must agree only upon an approximate time of attack. More precisely, we assume that the commander orders the time of the attack, and we require the following two conditions tohold:IC1 '. All loyal lieutenants attack within 10 minutes of one another. IC2'. If the commanding general is loyal, then every loyal lieutenant attacks within 10 minutes of the time given in the commander's order. (We assume that the orders are given and processed the day before the attack and that the time at which an order is received is irrelevant--only the attack time given in the order matters.} Like the Byzantine Generals Problem, this problem is unsolvable unless more than two-thirds of the generals are loyal. We prove this by first showing that if there were a solution for three generals that coped with one traitor, then we could construct a three-general solution to the Byzantine Generals Problem that also worked in the presence of one traitor. Suppose the commander wishes to send an "attack" or "retreat" order. He orders an attack by sending an attack time of 1:00 and orders a retreat by sending an attack time of 2:00, using the assumed algorithm. Each lieutenant uses the following procedure to obtain his order. (1) After receiving the attack time from the commander, a lieutenant does one of thefollowing:(a) If the time is 1:10 or earlier, then attack. (b) If the time is 1:50 or later, then retreat. (c) Otherwise, continue to step (2). ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.
6.The Byzantine Generals Problem • 387 (2) Ask the other lieutenant what decision he reached in step (1). (a) If the other lieutenant reached a decision, then make the same decision he did. (b) Otherwise, retreat. It follows from IC2' that if the commander is loyal, then a loyal lieutenant will obtain the correct order in step (1), so IC2 is satisfied. If the commander is loyal, then IC1 follows from IC2, so we need only prove IC1 under the assumption that the commander is a traitor. Since there is at most one traitor, this means that both lieutenants are loyal. It follows from I C I ' that if one lieutenant decides to attack in step (1), then the other cannot decide to retreat in step (1). Hence, either they will both come to the same decision in step (1) or at least one of them will defer his decision until step (2). In this case, it is easy to see that they both arrive at the same decision, so IC1 is satisfied. We have therefore constructed a three-general solution to the Byzantine Generals Problem that handles one traitor, which is impossible. Hence, we cannot have a three-general algorithm that maintains I C I ' and IC2' in the presence of a traitor. Th e method of having one general simulate m others can now be used to prove that no solution with fewer than 3rn + 1 generals can cope with m traitors. T he proof is similar to the one for the original Byzantine Generals Problem and is left to the reader. 3. A SOLUTION WITH ORAL MESSAGES We showed above that for a solution to the Byzantine Generals Problem using oral messages to cope with rn traitors, there must be at least 3m + 1 generals. We now give a solution that works for 3m + 1 or more generals. However, we first specify exactly what we mean by "oral messages". Each general is supposed to execute some algorithm that involves sending messages to the other generals, and we assume that a loyal general correctly executes his algorithm. The definition of an oral message is embodied in the following assumptions which we make for the generals' messagesystem:A1. Every message that is sent is delivered correctly. A2. Th e receiver of a message knows who sent it. A3. Th e absence of a message can be detected. Assumptions A1 and A2 prevent a traitor from interfering with the communication between two other generals, since by A1 he cannot interfere with the messages they do send, and by A2 he cannot confuse their intercourse by introducing spurious messages. Assumption A3 will foil a traitor who tries to prevent a decision by simply not sending messages. The practical implementation of these assumptions is discussed in Section 6. T h e algorithms in this section and in the following one require that each general be able to send messages directly to every other general. In Section 5, we describe algorithms which do not have this requirement. A traitorous commander may decide not to send any order. Since the lieutenants must obey some order, they need some default order to obey in this case. We let R E T R E A T be this default order. ACM Transactions on Progranuning Languages and Systems, Vol. 4, No. 3, July 1982.
7.388 L. Lamport, R. Shostak, and M. Pease We inductively define the Oral Message algorithms OM(m), for all nonnegative integers m, by which a c o m m a n d e r sends an order to n - 1 lieutenants. We show t h a t O M ( m ) solves the Byzantine Generals P r o b l e m for 3m + 1 or more generals in the presence of at most m traitors. We find it more convenient to describe this algorithm in terms of the lieutenants "obtaining a value" r a t h e r t h a n "obeying an order". T h e algorithm assumes a function m a j o r i t y with the p r o p e r t y t h a t if a majority of the values vi equal v, then m a j o r i t y (Vl,.. •, v,-D equals v. (Actually, it assumes a sequence of such f u n c t i o n s - - o n e for each n.) T h e r e are two natural choices for the value of m a j o r i t y ( v 1 , . . . , v,-1): 1. T h e majority value a m o n g the vi if it exists, otherwise the value R E T R E A T ; 2. T h e m e d i a n of the vi, assuming t h a t t h e y come from an ordered set. T h e following algorithm requires only the aforementioned p r o p e r t y of m a j o r i t y . Algorithm OM(0). (1) The commander sends his value to every lieutenant. (2) Each lieutenant uses the value he receives from the commander, or uses the value RETREAT if he receives no value. A l g o r i t h m O M ( m ) , m > O. (1) The commander sends his value to every lieutenant. (2) For each i, let vi be the value Lieutenant i receives from the commander, or else be RETREAT if he re :eives no value. Lieutenant i acts as the commander in Algorithm OM(m - 1) to send the value vi to each of the n - 2 other lieutenants. (3) For each i, and each j ~ i, let vj be the value Lieutenant i received from Lieutenant j in step (2) (using Algorithm OM(m - 1)), or else RETREAT if he received no such value. Lieutenant i uses the value majority (vl . . . . . v,-1 ). T o u n d e r s t a n d how this algorithm works, we consider the case m = 1, n = 4. Figure 3 illustrates the messages received by L i e u t e n a n t 2 when the c o m m a n d e r sends the value v and L i e u t e n a n t 3 is a traitor. In the first step of OM(1), the c o m m a n d e r sends v to all three lieutenants. I n the second step, L i e u t e n a n t 1 sends the value v to L i e u t e n a n t 2, using the trivial algorithm OM(0). Also in the second step, the traitorous L i e u t e n a n t 3 sends L i e u t e n a n t 2 some other value x. In step 3, L i e u t e n a n t 2 t h e n has v~ = v2 = v and v3 = x, so he obtains the correct value v = m a j o r i t y ( v , v, x ) . Next, we see w h a t h a p p e n s if the c o m m a n d e r is a traitor. Figure 4 shows the values received by the lieutenants if a traitorous c o m m a n d e r sends three arbitrary values x, y, and z to the three lieutenants. E a c h lieutenant obtains v~ = x, v2 = y, and v3 = z, so t h e y all obtain the same value m a j o r i t y ( x , y, z ) in step (3), regardless of w h e t h e r or not a n y of the three values x, y, and z are equal. T h e recursive algorithm O M ( m ) invokes n - 1 separate executions of the algorithm O M ( m - 1), each of which invokes n - 2 executions of OM(m - 2), etc. This m e a n s that, for m > 1, a lieutenant sends m a n y separate messages to each o t h e r lieutenant. T h e r e m u s t be some way to distinguish a m o n g these different messages. T h e reader can verify t h a t all ambiguity is r e m o v e d if each lieutenant i prefixes the n u m b e r i to the value vi t h a t he sends in step (2). As the recursion "unfolds," the algorithm O M ( m - k) will be called (n - 1) . . . (n - k) times to send a value prefixed by a sequence of k lieutenants' numbers. ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.
9.390 L. Lamport, R. Shostak, and M. Pease T o p r o v e the correctness of the algorithm OM{m) for a r b i t r a r y m, we first p r o v e the following l e m m a . LEMMA 1. F o r a n y m a n d k, A l g o r i t h m O M (m ) satisfies IC2 if there are more t h a n 2k + m g e n e r a l s a n d at m o s t k traitors. PROOF. T h e p r o o f is b y induction on m. IC2 only specifies w h a t m u s t h a p p e n if the c o m m a n d e r is loyal. Using A1, it is easy to see t h a t the trivial algorithm OM(0) works if the c o m m a n d e r is loyal, so the l e m m a is true for m = 0. We now a s s u m e it is true for m - 1, m > 0, a n d p r o v e it for m. In step (1), the loyal c o m m a n d e r sends a value v to all n - 1 lieutenants. In step (2), e a c h loyal l i e u t e n a n t applies O M ( m - 1) with n - 1 generals. Since b y h y p o t h e s i s n > 2k + m, we h a v e n - 1 > 2k + (m - 1), so we can a p p l y the induction h y p o t h e s i s to conclude t h a t e v e r y loyal l i e u t e n a n t gets vj = v for e a c h loyal L i e u t e n a n t j. Since t h e r e are at m o s t k traitors, and n - 1 > 2k + (m - 1) _> 2k, a m a j o r i t y of the n - 1 lieutenants are loyal. Hence, e a c h loyal lieutenant h a s vi = v for a m a j o r i t y of the n - 1 values i, so he obtains majority(v1 . . . . , v,-1) = v in step (3), proving IC2. [] T h e following t h e o r e m asserts t h a t Algorithm O M ( m ) solves the B y z a n t i n e Generals Problem. THEOREM 1. F o r a n y m, A l g o r i t h m O M (m ) satisfies conditions IC1 a n d IC2 i f there are m o r e t h a n 3m g e n e r a l s a n d at m o s t m traitors. PROOF. T h e p r o o f is b y induction on m. I f t h e r e are no traitors, t h e n it is easy to see t h a t OM(0) satisfies IC1 and IC2. W e therefore a s s u m e t h a t the t h e o r e m is true for O M ( m - 1) a n d p r o v e it for O M ( m ) , m > 0. W e first consider the case in which the c o m m a n d e r is loyal. B y taking k equal to m in L e m m a 1, we see t h a t O M ( m ) satisfies IC2. IC1 follows f r o m IC2 if the c o m m a n d e r is loyal, so we need only verify IC1 in the case t h a t the c o m m a n d e r is a traitor. T h e r e are at m o s t m traitors, and the c o m m a n d e r is one of t h e m , so at m o s t m - 1 of the lieutenants are traitors. Since t h e r e are m o r e t h a n 3m generals, t h e r e are m o r e t h a n 3m - 1 lieutenants, and 3m - 1 > 3(m - 1). W e m a y therefore a p p l y the induction h y p o t h e s i s to conclude t h a t O M ( m - 1) satisfies conditions IC1 and IC2. Hence, f o r e a c h j, a n y two loyal lieutenants get the s a m e value for vj in step (3). (This follows f r o m IC2 if one of the two lieutenants is L i e u t e n a n t j, a n d f r o m IC1 Otherwise.) Hence, a n y two loyal lieutenants get the s a m e vector of values vl . . . . . Vn-~, and therefore obtain the s a m e value majority(vl . . . . . Vn-1) in step (3), proving IC1. [] 4. A SOLUTION WITH SIGNED MESSAGES As we saw f r o m the scenario of Figures i and 2, it is the traitors' ability to lie t h a t m a k e s the B y z a n t i n e Generals P r o b l e m so difficult. T h e p r o b l e m b e c o m e s easier to solve if we can restrict t h a t ability. One w a y to do this is to allow the generals to send unforgeable signed messages. M o r e precisely, we add to A1-A3 the ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.
10.The Byzantine Generals Problem 391 followingassumption:A4 (a) A loyal general's signature cannot be forged, and any alteration of the contents of his signed messages can be detected. (b) Anyone can verify the authenticity of a general's signature. N o t e t h a t we make no assumptions about a traitorous general's signature. In particular, we allow his signature to be forged by a n o t h e r traitor, t h e r e b y permitting collusion among the traitors. Now t h a t we have introduced signed messages, our previous argument t h a t four generals are required to cope with one traitor no longer holds. In fact, a three-general solution does exist. We now give an algorithm t h a t copes with m traitors for any n u m b e r of generals. (The problem is vacuous if there are fewer t h a n m + 2 generals.) In our algorithm, the c o m m a n d e r sends a signed order to each of his lieutenants. E a c h lieutenant t h e n adds his signature to t h a t order and sends it to the other lieutenants, who add their signatures and send it to others, and so on. This means t h a t a lieutenant must effectively receive one signed message, make several copies of it, and sign and send those copies. It does not m a t t e r how these copies are obtained; a single message might be photocopied, or else each message might consist of a stack of identical messages which are signed and distributed as required. Our algorithm assumes a function choice which is applied to a set of orders to /obtain a single one. T h e only requirements we make for this function are / 1. If the set V consists of the single element v, t h e n c h o i c e ( V ) = v. 2. choice(Q) = R E T R E A T , where O is the e m p t y set. N o t e t h a t one possible definition is to let c h o i c e ( V ) be the median element of V--assuming t h a t there is an ordering of the elements. In the following algorithm, we let x : i denote the value x signed by General i. Thus, v :j:i denotes the value v signed by j, and t h e n t h a t value v : j signed by i. We let General 0 be the commander. In this algorithm, each lieutenant i maintains a set Vi, containng the set of properly signed orders he has received so far. (If the c o m m a n d e r is loyal, t h e n this set should never contain more t h a n a single element.) Do not confuse Vi, the set of o r d e r s he has received, with the set of messages t h a t he has received. T h e r e m a y be m a n y different messages with the same order. Algorithm S M (rn). Initially Vi = 0. (1) The commander signs and sends his value to every lieutenant. (2) For eachi:(A) If Lieutenant i receives a message of the formv:'>v: