Distributed Algorithms. Nancy A. Lynch (1993) (Distributed Algorithms. Nancy A. Lynch (1993).pdf), страница 12
Описание файла
PDF-файл из архива "Distributed Algorithms. Nancy A. Lynch (1993).pdf", который расположен в категории "". Всё это находится в предмете "распределенные алгоритмы" из 10 семестр (2 семестр магистратуры), которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 12 страницы из PDF
The probability of this event is 11= = , since mlevel A is xed,and key is uniformly distributed in 0 1=].4.2 Faulty ProcessorsIn this section we continue our treatment of the consensus problem, but now we consideranother situation, where processors fail, rather than links.
We shall investigate two cases:stopping failures, where processors may experience \sudden death", and Byzantine failures,where a faulty process may exhibit absolutely unconstrained behavior the former aims atmodelling unpredictable crashes, and the latter may model buggy programs.In both cases, we assume that the number of failures is bounded in advance. Even thoughthis assumption is realistic in the sense that it may be highly unlikely for a larger number55of failures to occur, there is a serious drawback to these models: if the number of failures isalready quite high, then it is likely in practice that we will have more failures. More precisely,assuming a bound on the number of failures implicitly implies that the failures are negativelycorrelated. It is arguable that failures are independent, or even positively correlated.4.2.1 Stop FailuresWe rst consider the simple case where they fail by stopping only.
That is, at some pointduring the algorithm, a processor may stop taking steps altogether. In particular, a processorcan stop in the middle of the message sending step at some round, i.e., at the round in whichthe processor stops, only a subset of the messages it \should have" sent may actually besent. We assume that the links are perfectly reliable | all the messages that are sent aredelivered.
For simplicity, we assume that the communication graph is complete. As before,we assume that the n processors start with initial values.Now the problem is as follows. Suppose that at most f processors fail. We want all thenonfaulty processors to eventually decide, subject to:Agreement: all values that are decided upon agree.Validity: if all processors have initial value v, then the only possible decision value is v.We shall now describe a simple algorithm that solves the problem. In the algorithm, eachprocessor maintains a labeled tree that represents the \complete knowledge" of the processoras follows.
The tree has f + 2 levels, ranging from 0 (the root), to f + 1 (the leaves). Eachnode at level k, 0 k f , has n ; k children. The nodes are labeled by strings as follows.The root is labeled by the empty string , and each node with label i1 : : : ik has n ; k childrenwith labels i1 : : :ik j , where j ranges over all the elements of f1 : : : ng ; fi1 : : : ik g. SeeFigure 4.2 for an illustration.The processes ll the tree in the course of computation, where the meaning of a value vat a node labeled i1 : : : ik is informally \ ik knows that ik;1 knows ...
that i2 knows that thevalue of i1 is v". The algorithm now simply involves lling in all the entries in the tree, andafter f + 1 rounds, deciding according to some rule we shall describe later.More precisely, each process i lls in its own initial value for the root, value i(), in thestart state. Then in round 1, process i broadcasts this value to all the other processes forsimplicity of presentation, we also pretend that process i sends the value to itself (thoughin our model, this is simulated by local computation). At each round k, 1 k f + 1,process i broadcasts all the values from all the level k ; 1 nodes in its tree whose labels donot contain i, to all the processes. The recipients record this information in the appropriate56root112...123 .
. .12n21n21233...2n31 . . .Level 13nLevel 2...Level f+1Figure 4.2: the tree used for the stop-failures algorithmnode at level k of their tree: the value sent by process j associated with its node i1i2 : : : ik;1is associated with the node i1i2 : : :ik;1j . This is value i(i1i2 : : :ik;1j ). In this way the treesare lled up layer by layer.Thus, in general, a value v associated with node labeled i1i2 : : :ik at a process j meansthat ik told j that ik;1 told ik that : : : i1 told i2 that v was i1's initial value.Lastly, we specify a decision rule. Each process that has not yet failed denes W to bethe set of values that appear anywhere in its tree.
If W is a singleton set, then choose theunique element of W otherwise, choose a pre-specied default value.We shall now argue that the above algorithm satises the requirements.Claim 8 (Validity) If all the initial values are v, then the only possible decision value is v.Proof: Trivial if all start with v, then the only possible value at any node in anyone's treeis v, and hence, by the decision rule, only v can be decided upon.Claim 9 (Agreement) All the values decided upon are identical.Proof: Let Wi be the set of values in the tree of processor i. By the decision rule, it sucesto show that Wi = Wj for any two processors i and j that decide (i.e., that complete thealgorithm).
We prove that if v 2 Wi then v 2 Wj . Suppose v 2 Wi. Then v = value i(p) forsome label p that does not contain i. (Convince yourself of this.) If jpj f , then jpij f +1,so process i relays value v to process j at round jpij, and v = value j (pi), and thus, v 2 Wj .On the other hand, if jpj = f + 1, then there is some nonfaulty process l appearing in thelabel p, so that p = qlr, where q and r are strings. Then at round jqlj, processor l succeedsin sending to j , and therefore v is relayed to j (since that's the value relayed further by theprocesses in r).
Thus, v = value j (ql), and hence v 2 Wj .By the same argument, if v 2 Wj then v 2 Wi , and we conclude that Wi = Wj .One can easily see that the number of bits communicated in the execution of the protocolis prohibitively large (O(nf +1 )) and deem this protocol impractical. The important thing57we learned from the above algorithm, however, is that the consensus problem is solvable for(this simple kind of) processor failures.
This, of course, should be contrasted with Theorem1 which states that the problem is unsolvable in the case of communication failures. In thenext lecture we shall see how can the number of communicated bits be reduced dramatically.586.852J/18.437J Distributed AlgorithmsLecturer: Nancy LynchSeptember 24, 1992Lecture 55.1 Byzantine FailuresLast time, we considered the consensus problem for stopping failures. Now suppose that weallow a worse kind of processor failure (cf. Lamport et al., Dolev-Strong, Bar-Noy et al.):the processors might not only fail by stopping, but could exhibit arbitrary behavior.
Suchfailures are known as \Byzantine" failures, because of the bad reputation of the ByzantineGenerals. It must be understood that in this model, a failed process can send arbitrarymessages and do arbitrary state transitions. The only limitation on the behavior of a failedprocess is that it can only \mess up" the things that it controls locally, namely its ownoutgoing messages and its own state.In order for the consensus problem to make sense in this setting, its statement must bemodied slightly. As before, we want all the nonfaulty processes to decide.
But now theagreement and validity conditions are as follows.Agreement: All values that are decided upon by nonfaulty processes agree.Validity: If all nonfaulty processes begin with v, then the only possible decision value bya nonfaulty process is v.It is intuitively clear the now the situation is somewhat harder than for stopping faults.In fact, we shall show that there is a bound on the number of faults that can be tolerated.Specically, we will see that n > 3f processes are needed to tolerate f faults.
To gain someintuition as to the nature of the problem, let us consider the following example.What might go wrong with too few processes. Consider three processes, i j k, andsuppose they are to solve consensus tolerating one Byzantine fault. Let us see why it isimpossible for them to decide in two rounds (See Figure 5.1).Scenario 1: Processes i and k are nonfaulty and start with 0. Process j is faulty. In therst round, processes i and k report their values truthfully, and process j tells both i and kthat its value is 1.
Consider the viewpoint of process i. In the second round, process k tells59(a)(b)jj(c)1i0k0i1jik1k0Figure 5.1: possible congurations for three processors. The shaded circles represent faulty processes. Congurations (a), (b) and (c) depict Scenarios 1, 2 and 3 respectively.i (truthfully) that j said 1, and process j tells i (falsely) that k said 1. In this situation, theproblem constraints require that i and k decide 0.Scenario 2: Dual to Scenario 1. Processes j and k are nonfaulty and start with 1. Processi is faulty.