Distributed Algorithms. Nancy A. Lynch (1993) (811416), страница 34
Текст из файла (страница 34)
First, we recall what this means. We areusing the program above (Figures 14.2,14.1) as an object specication. It describes theinterface, and gives a well-dened set of well-formed behaviors (whatever they are). To getan implementation of this spec, we need to design another algorithm that also preserveswell-formedness, that has the required liveness properties, and whose well-formed behaviorsare a subset of those of this algorithm.The algorithm we use is very close to the unbounded algorithm above given by the spec.The only dierence is in the label domain | for the bounded case we shall use a nite domainof size 5n;1 .
The new domain has a binary relation dened for the purpose of determiningorder of labels, maximum, etc., only now it can't be a total order because elements have165Shared variablesA snapshot object, whose value is a vector of pairs, (label value ), with onecomponent for each \labeler" process i. The labels are positive real numbers, and the domainof value is arbitrary.scaniprocessLocal state:Actions:pc : program counter(snaplabel snapvalue ): to hold values returned by a snaporder : to hold an ordering of processes to be returnedbeginscanEect:pc snapisnapPrecondition:pc = snapEect:(snaplabel snapvalue ) (label value )pc dene-responseidene-responsePrecondition:pc = dene-responseEect:order the total order on indices where j appearsbefore k i (snaplabel j ) < (snaplabel k)pc endscanijkendscan (o v)Precondition:pc = endscano = orderv = snapvalueEect:pc beginscaniFigure 14.1: Specication algorithm for timestamp system, part I166labeliprocessLocal state:Actions:pc(snaplabel snapvalue ), to hold values returned by a snap .newlabel , to hold the newly-chosen labelbeginlabelEect:pc snapisnapPrecondition:pc = snapEect:(snaplabel snapvalue ) label valueif 8j , (snaplabel j ] j ) (snaplabel i] i) thennewlabel snaplabel i]else newlabel max (snaplabel ) + r, where ris any positive real (nondeterministic)pc updateiupdatePrecondition:pc = updateEect:label i] newlabelpc endlabeliendlabelPrecondition:pc = endlabelEect:pc beginscaniFigure 14.2: Specication algorithm for timestamp system, part II167to be reused.
The algorithm will ensure, however, that the labels that are in use at anyparticular time are totally ordered by the given relation.The only changes in the code are in the actions: dene-response of scan , in particular,where the order of indices is determined from the scanned labels, and snap of label , inparticular, in determining if the process executing the step has the maximum label, and indetermining the next label to choose.Atomic Labels and ScansTo motivate the value domain chosen, rst suppose that each label and scan operation isdone atomically.
Consider two processes. We use the label domain consisting of f1 2 3g,with relation given by the arrows: see Figure 14.3.213Figure 14.3: 3-element domain for two processes. a b indicates that a b.That is, order relation \" is f(1 2) (2 3) (3 1)g. Suppose that the two processes, pand q, initially both have label = 1 (ties are broken by process index). If p does a labeloperation, it sees that q has the maximum (assuming that p < q), so when p chooses a newlabel, it gets the \next label", which is 2. Now if q does a label , it sees that the maximumis 2, and therefore chooses the next label, which is now 3.
The two, if they take turns, justchase each other around the circle in a leapfrog fashion. It is important to see that the eectis just as if they were continuing to choose from an unbounded space.What about three processes? We can't simply extend this to a ring of a larger size:consider the following scenario. Suppose one of the processes, say p, retains label 1 and doesno new label operations, while the other two processes, say q and r, \leapfrog" around thering. This can yield quite dierent behavior from the unbounded algorithm, when eventually,q and r bump into p.
In fact, we don't even have a denition of how to compare nonadjacentlabels in this ring.A valid solution for the 3 processes case is given in the recursive label structure depictedin Figure 14.4. In this domain, we have three \main" level 1 components, where each ofthem is constructed from three level 2 components. In each level, the ordering is given by thearrows. A label now consists of a string of two \sub-labels", one for each level. The ordering168213221231133Figure 14.4: 9-element domain for three processes. Each element has a label that consists of twostrings of f1 2 3g, and the relation is taken to be lexicographical.is determined lexicographically.
To get a feeling why this construction works, consider againthe three-process scenario described above. If p keeps the initial label 1:1, q will choose 1:2next. But then r will not wrap around and choose 1:3, because the \1-component" of level1 is already \full" (i.e., it contains 2 processes). So instead, it goes to 2:1, starting in thenext component. Thus, the total order of processes is p q r, since the given relation relatesall three of the pairs of labels involved here (we have (1:1 1:2) (1:2 2:1), and (1:1 2:1) in therelation).
In the next step, q would choose 2:2. Then r could choose (2 3) and still maintainthe total order property (because the component is not considered full if the 2 processes itcontains include the one now choosing). This can go on forever, since now q and r alonewould just cycle within component 2 without violating the requirement.It is now quite easy to extend this scheme for the general case of n processes. We will havea domain of size 3n;1 , where each label is a length n ; 1 string of f1 2 3g, that correspondsto a depth n ; 1 nesting of 3-cycles.
We say that a level k component, 1 k n ; 1, is\full" if it contains at least n ; k processes. An invariant can be proved saying that for anycycle, at any level, at most two of the components are \occupied", which suces to yield atotal order (cf. homework problem).169The following rule is used to pick a new value. If the choosing process is the maximum(i.e., it dominates all the others in the given relation), it keeps its old value. If it is not themaximum, then it looks at the current maximum value. It nds the rst level k, 1 k n ; 1, such that the level k component of the maximum is full (i.e., contains at least n ; kprocesses' values, excluding the process currently choosing).
The chosen value is the rstlabel (according to the relation) in the next component at level k.Note that there must be some full level, since fullness at level n ; 1 just means that thereis a process (i.e., the maximum itself) in the lowest-level component (viz., the single nodecontaining the maximum value).It is perhaps helpful to trace this procedure for n = 4 processes, with 3 levels of nesting(see Figure 14.5).If the max's level 1 component is full, it means that it contains all 3 other processes, sowe can choose the next level 1 component, since we are guaranteed that no one has a valueoutside the max's level 1 component. If the max's level 1 component is not full, then thereare at most 2 processes in that component.
We then look in the level 2 component of themax. If it is full, it contains 2 processes' values, so that means there are none left for theother two level 2 components, and hence it's safe to choose there. Else, there are at most 1process in the level 2 component. We can repeat the argument for this case to see that is OKto choose the next element within the level 2 component, i.e., the next level 3 component.Nonatomic LabelsThe \recursive triangles" construction above doesn't quite give us what we want. Theproblems arise when we consider concurrent execution of the label operations. Consider the32 structure with 3 processes. Suppose that we get to the point where p, q and r have labels1:1, 1:2 and 2:1, respectively (see Figure 14.6).Suppose that both p and q initiate label operations.
First, both do their embedded snapsteps, discovering that the max label in use is 2:1. They both choose label 2:2. Process pwrites its new label 2:2, but process q delays writing. Now processes p and r can leapfrogwithin the 2 component, eventually reaching a point where the occupied labels are 2:3 and2:1. So far so good, but now let q perform its delayed write. This will make all three labelsin component 2 occupied, which will break the total ordering.In order to handle such race conditions when processes rst enter a component, the size3 ring is modied by adding a \gateway" (see Figure 14.7). The labels for n processes areobtained by nesting recursively to depth n ; 1 as before (see Figure 14.8) again, we say thata level k component, 1 k n ; 1, is \full" if it contains at least n ; k processes. We usethe same choice rule as before, looking for the rst level at which the max's component is170213221231133221231222123113231323111333Figure 14.5: 27-elements domain for four processes.full.