Distributed Algorithms. Nancy A. Lynch (1993) (811416), страница 70
Текст из файла (страница 70)
The argument isinduction on the actions: assuming that a given state is legal with respect to a given linksubsystem, we show that applying any action will not break any invariant in this subsystem.Thus the proofs are basically a straightforward (somewhat tedious) case-analysis. Using thelocal correction theorem, we can deduce that the protocol is self-stabilizing.It is more interesting to analyze the executions of the algorithm.
We need a few denitions.Denition 1 Let (sk ak sk+1 : : :) be an execution fragment. An interval i j ] is a readyinterval at node u if1. mode (u) = Ready in all states sl for all i lj.2. If i > k then mode (u) 6= Ready in si;1 , and if j < 1 then mode (u) 6= Ready in sj +1 .338Abort interval and converge interval are dened analogously.For these mode intervals, we have the following lemma.Lemma 2 Let (sn an sn+1 : : :) be an execution fragment and let u be a node. Then at u,1. A ready interval may be followed only by an abort interval.2. A converge interval may be followed only by a ready interval.The proof is by straightforward inspection of the actions.
Next we dene the concept ofparent graph.Denition 2 Let s be a state. The parent graph G(s) is dened by the parent and theQueue variables as follows. The nodes of G(s) are the nodes of the network, and e = (u v )is an edge in G(s) i parentu = e and Ready 2= Queuev e] in s.From the distances invariant, we know that the parent graph is actually a forest in legalstates.
We also dene a special kind of abort intervals. An abort interval i j ] at node u isa root interval if for some i k j , parentu = nil in sk .The next step of the proof is to show that for any abort interval there exists a rootinterval that contains it. This is done by induction on the depth of the node in the parentgraph. Using this fact, we bound the duration of abort intervals.Lemma 3 Let s be a legal state, let u be a node, and suppose that no topological changesoccur after s.
If mode (u) = Abort in s and depth (u) = k in G(s), then there exists a states0 in the following 2(n ; k) time units such that mode (u) 6= Abort in s0.This lemma is proven by showing that no Ack Pend bit is continuously true for morethan 2(n ; k) time units. This implies the lemma, since no Ack Pend bit is set to true whilethe node is in a non-ready mode.Similarly, we can prove by induction on depth (u) the following bound on the duration ofthe converge intervals.Lemma 4 Let s be a legal state at time t, let u be a node, and suppose that no topologicalchanges occur after time t. If mode (u) = Abort in s, and depth (u) = k in G(s), then bytime t +2n + depth (u), an action occurs, in which mode (u) is changed to Ready, and Readymessages are sent by u to all its neighbors.It is easy to derive the following conclusion.Theorem 5 Suppose that the links (including the link invariants) stabilize in C time units.If the number of reset requests and topological changes is nite, then in C + 3n time unitsafter the last reset request/topological change, mode (u) = Ready at all nodes u.339Proof: Consider the root intervals.
Note that by the code, each such root interval can beassociated with an input request made at this node, or with a topological change. Sinceeach request can be associated with at most one root interval, and since each topologicalchange can be associated with at most two root intervals, the number of root intervals isnite. Furthermore, by Lemma 4, 2n time units after the last reset request all root intervalsterminate, and Ready will be propagated. Since any non-root abort interval is containedin some root interval, after 2n time all abort intervals have terminated, and hence, 3n timeunits after the last reset request, the mode of all the nodes is Ready.The following theorem states the liveness property of the Reset protocol.Theorem 6 Suppose that the execution of the Reset protocol begins at time 0 at arbitrarystate, and suppose that the links (including the link invariants) stabilize in C time units.
Ifthe last reset request/topological change occurs at time t > C + 3n, then a RESET signal isoutput at all the nodes by time t + 3n.Proof Sketch: The key property we need is that the time between successive ready intervals is at most 3n time units.First, notice that it suces to show that for any node v, there is a time in the intervalC t + n] in which mode (v) 6= Ready.Suppose now, for contradiction, that at time t > C + 3n occurs a reset request at somenode w, and that there exists a node u whose mode is Ready in times C t + n]. Let v beany node in the network. We show, by induction on the distance i between u and v, thatmode (i) = Ready in the time interval C + 3n t + n ; i].
This will complete the proof, sincethe hypothesis implies that mode (w) 6= Ready at time t.The base case follows from the assumption that mode (u) = Ready in the time intervalC t + n]. The inductive step follows from the fact that if for some node v0, in time t0 > C +3nwe have mode (v0) 6= Ready, then there occurs an action in the time interval t0 ; 3n t0] inwhich the mode of v0 changed from Ready to Abort, and Abort messages were sent to allthe neighbors of v0. Hence, for each neighbor v of v0, there is a state in the time intervalt0 ; 3n t0 ; 3n + 1] in which mode (v) 6= Ready.Next, we prove the consistency of the Reset protocol.
This property follows from thesimple Buffer mechanism. There is a buer dedicated to each link, such that all $-messagesarriving from link e are stored in Buffere]. Buffere] is ushed whenever a Abort messagearrives from e.Theorem 7 Suppose no topological changes occurs after time C , and that the last resetsignals occur at two adjacent nodes u and v after time C +3n.
Then the sequence Send(m e)input by the user at u following the last reset signal at u, is identical to the sequence ofReceive(m e) output to the user at v after the last reset signal at v.340Proof: Consider the last non-ready intervals in u and v. Suppose that the last states inwhich mode (u) =6 Ready and mode (v) 6= Ready are sju and sjv , respectively, and let siu andsiv be the rst states in the last non-ready intervals, respectively (i.e., for all iu k ju, insk mode (u) 6= Ready, and in siu;1 , mode (u) = Ready). Note that since the last reset signalis output after time C + 3n, it follows that the states siu;1 and siu ;1 indeed exist. Notefurther that the change of mode to Abort is accompanied by broadcasting Abort messages(by Propagate ).Consider Send(m e) input at u after sju . We argue that m is delivered at v after siv :suppose not. Then the mode of v is changed to Abort after sju , and v broadcasts Abort tou.
Hence there must be a state sk k > ju , such that mode (u) 6= Ready in sk , contradictingthe assumption that ju is the last state in the last non-ready interval. Recalling that themanager buers $-messages until the mode is Ready, we conclude that the correspondingReceive(m e) is output to the user at v after sjv .Consider Send(m e) input at u before sju . Since the manager discards all $-messagesinput while in non-ready mode, we need only to consider Send(m e) input at u before siu .Again, we argue that Receive(m e) is not output at v after sjv .
This follows from the factthat there must be Abort message sent to u from v after m, that implies that the mode ofu is not Ready after sju , a contradiction.24.3.6 CommentsThe reset protocol is a powerful tool, but sometimes it's an overkill: it doesn't seem reasonable to reset the whole system whenever a minor but frequent fault occur (e.g., a new nodejoins, a link fails). It seems that one of the best applications of reset is when it is combinedwith unbounded-register protocols: the eect of the reset signal in this case can usually bedened easily (e.g., set counters to 0).Finally, we note that the time complexity actually is bounded by the length of the longestsimple path in the network (which is a more rened measure than just the number of nodes).If the network is a tree, for instance, the protocol works in diameter time, which is clearlyoptimal.
The space complexity is logarithmic in the bound N . As typical for the localcorrection method, the communication bandwidth required for stabilization is proportionalto the space, which is in our case logarithmic.24.4 Application: Network SynchronizationRecall the network synchronization problem.
The basic property there was that if a messageis sent in (local) round i, then no messages of rounds i ; 1 or less will ever be received in that341node. We abstract the synchronizer as a service module whose task is to provide the userwith pulse numbers at all times. If we associate with each node v its round/pulse numberP (v), it can be easily seen that a state is legal only if no two adjacent nodes pulses dier bymore than 1, i.e., for all nodes vu 2 N (v) ) jP (u) ; P (v)j 1(24.9)(N (v) denotes the set of nodes adjacent to v.) Call the states satisfying Eq.
(24.9) legal.Of course, we also want to keep progressing. This can be stated as asserting that for eachconguration, there is some pulse number K , such that all nodes get all pulses K K + 1 : : :If the system initialized in a legal state, we can use the following rule.P (v) u2Nminv fP (u) + 1g(24.10)( )The idea is that whenever the pulse number is changed, the node sends out all themessages of previous rounds which haven't been sent yet.The rule above is stable, i.e., if the conguration is legal, then applying the rule arbitrarilycan yield only legal conguration. Notice however, that if the state is not legal, then applyingthe rule may cause pulse numbers to drop. This is a reason to be worried, since the regularcourse of the algorithm requires pulse numbers only to grow. And indeed, the rule is notself-stabilizing.23412312321442233442343513344412433454432(i)235542433(e)323422(h)2434(d)34322(g)24123(f)533(c)32142(b)3341(a)43423454Figure 24.7: An execution using rule 24.10.
The node that moved in each step in marked.One idea that can pop into mind to try to repair the above aw is to never let pulsenumbers go down. Formally, the rule is the following.(P (v) max P (v) u2Nminv fP (u) + 1g( )342)(24.11)This rule can be proven to be self-stabilizing. However, it suers from a serious drawbackregarding its stabilization time.
Consider the conguration depicted below.111000000Figure 24.8: A pulse assignment for rule 24.11.A quick thought should suce to convince the reader that the stabilization time here is inthe order of 1000000 time units, which seems to be unsatisfactory for such a small network.The next idea is to have a combination of rules: if the neighborhood is legal, then theproblem specication requires the min+1 rule is used.
But if the neighborhood is not legal,another rule can be used. The rst idea we consider is the following.8<P (v) : minu2N (v) fP (u) + 1gmaxu2N (v) fP (u)gif 8u 2 N (v) : jP (v) ; P (u)j 1otherwise(24.12)It is fairly straightforward to show that if an atomic action consists of reading one's neighbors and setting its own value (in particular, no neighbor changes its value in the meanwhile),then the max rule above indeed converges to a legal conguration.