Distributed Algorithms. Nancy A. Lynch (1993) (811416), страница 66
Текст из файла (страница 66)
Thus the following piece of notationis useful.Denition 13 For any partial order <, height(<) is the length of the maximum length chainin <.23.6 Local Correction Theorem23.6.1 Theorem StatementIn the previous section, we set up plausible conditions under which a network automatoncan be locally corrected to achieve a property L.
We claimed that these conditions could beexploited to yield local correction. In this section we make these claims precise.Intuitively, the result states that if N is locally correctable to L using local reset functionf and partial order <, then we can transform N into a new augmented automaton N + suchthat N + satises the following property: eventually every behavior of N + will \look like" abehavior of N in which L holds. We use the notation NjL to denote an automaton identicalto N except that the initial states of NjL are restricted to lie in set L.Theorem 2 Local Correction: Consider any network automaton N = Net(G N ) that islocally correctable to L using link predicate set L, local reset function f , and partial order<. Then there exists some N + that is a UIOA and a network automaton such that N +stabilizes to the behaviors of N (c)jL.So far we have ignored time complexity. However, time complexity is of crucial importance to a protocol designer because nobody would be interested in a protocol that tookyears to stabilize.
Thus, V92], uses a version of the timed automata model throughout anduses this model to give a formal denition of stabilization time. However, intuitively, wecan measure time complexity as follows. We assume that it takes 1 time unit to send anymessage on a link and to deliver a free action we assume that node processing takes time0.
With this assumption, we can dene the stabilization time of a behavior to be time ittakes before the behavior has a sux that belongs to the target set. We dene the overallstabilization time as the worst case stabilization time across all behaviors.Using this intuitive denition of stabilization time we can state a more rened version ofthe Local Correction Theorem. The renement we add is that N + stabilizes to the behaviorsof N (c)jL in time proportional to height(<), the height of the partial order.320Node vNode uTIMEREQUESTRESPONSEFigure 23.9: The structure of a single phase of the local snapshot/reset protocol23.6.2 Overview of the Transformation CodeTo transform N into N + we will add actions and states to N .
These actions will be usedto send and receive snapshot packets (that will be used to do local checking on each linksubsystem) and reset packets (that will be used to do local correction on each link subsystem).For every link (u v), the leader l(u v) initiates the checking and correction of the (u v)subsystem. The idea is, of course, that the leader of each (u v) subsystem will periodicallydo a local snapshot to check if the (u v) subsystem satises its predicates if the leaderdetects a violation, it tries to make the local predicate true by doing a local reset of the (u v)subsystem.The structure of our local snapshot protocol is slightly dierent from the ChandyLamport snapshot protocol ChandyL85] studied earlier in the course.
It is possible toshow V92] that the Chandy-Lamport scheme cannot be used without modications overunit storage links. (Prove this: it will help you understand the Chandy-Lamport schemebetter.)Our local snapshot/reset protocol works roughly as follows. Consider a (u v) subsystem.Assume that l(u v) = u { i.e., u is the leader on link (u v). A single snapshot or reset phasehas the structure shown in Fig 23.9.A single phase of either a snapshot or reset procedure consists of u sending a requestthat is received by v, followed by v sending a response that is received by u. During a phase,node u sets a ag (checkuv]) to indicate that it is checking/correcting the (u v) subsystem.While this ag is set, no packets other than request packets can be sent on link Cu v .
Sincea phase completes in constant time, this does not delay the data packets by more than aconstant factor.In what follows, we will use the basic state at a node u to mean the part of the state atu \corresponding" to automaton Nu . To do a snapshot, node u sends a snapshot request to321uvuREQcvREQ(c)TIMERESPStore cREQc=c+1RESPRESP(c)Matching using a CounterIncorrect MatchingFigure 23.10: Using counter ushing to ensure that request-response matching will workcorrectly within a small number of phases.v. A snapshot request is identied by a mode variable in the request packet that carries amode of snapshot.
If v receives a request with a mode of snapshot, Node v then records itsbasic state (say s) and sends s in a response to u.When u receives the response, it records its basic state (say r). Node u then records thestate of the (u v) subsystem as x = (r nil nil s). If x 62 Lu v (i.e., local property Lu v doesnot hold) then u initiates a reset.To do a reset, node u sends a reset request to v. A reset request is identied by a modevariable in the request packet that carries a mode of reset.
Recall that f denotes the localreset function. After v receives the request, v changes its basic state to f (v s u), where s isthe previous value of v's basic state. Node v then sends a response to u. When u receivesthe response, u changes its basic state to f (u r v), where r is the previous value of u's basicstate.Of course, the local snapshot and reset protocol must also be stabilizing.
However, theprotocol we just described informally may fail if requests and responses are not properlymatched. This can happen, for instance, if there are spurious packets in the initial stateof N + . It is easy to see that both the local snapshot and reset procedures will only workcorrectly if the response from v is sent following the receipt of the request at u. The diagramon the left of Figure 23.10 shows a scenario in which requests are matched incorrectly to\old" responses that were sent in previous phases.
Thus we need each phase to eventuallyfollow the structure shown in the right of Figure 23.10.To make the snapshot and reset protocols stabilizing, we use counter ushing. Thus we322number all request and response packets. Thus each request and response packet carries anumber count. Also, the leader u keeps a variable countu v] that u uses to number all requestssent to v within a phase. At the end of the phase, u increments countuv]. Similarly, theresponder v keeps a variable countv u] in which v stores the number of the last request it hasreceived from u.
Node v weeds out duplicates by only accepting requests whose number isnot equal countu v].Clearly the count values can be arbitrary in the initial state and the rst few phases maynot work correctly. However, counter ushing guarantees that in constant time a responsewill be properly matched to the correct request. Because the links are unit storage, therecan be at most 3 distinct counter values (one in each link and one at the receiver) storedin the link subsystem. Thus a space of 4 numbers is sucient to guarantee that eventually(more precisely within 5 phases) requests and responses are correctly matched.Besides properly matching requests and responses, we must also avoid deadlock whenthe local snapshot/reset protocol begins in an arbitrary state.
To do so, when checkuv] istrue (i.e., u is in the middle of a phase), u continuously sends requests. Since v weeds outduplicates this does no harm and also prevents deadlock. Similarly, v continuously sendsresponses to the last request v has received. Once the responses begin to be properly matchedto requests, this does no harm, because u discards such duplicate responses.23.7 Intuitive Proof of Local Correction TheoremA formal proof of the Local Correction theorem can be found in V92]. In this section, weprovide the main intuition. We have already described how to transform a locally correctableautomaton N into an augmented automaton N + We have to prove that in time proportionalto the height of the partial order every behavior of N + is a behavior of N in which all localpredicates hold.We have already described the intuition behind the use of a counter to ensure properrequest-response matching.
We now describe the intuition behind the local snapshot andreset procedures. The intuition behind the snapshot is very similar to the intuition behindthe snapshot procedure studied earlier in the course.23.7.1 Intuition Behind Local SnapshotsThe diagram on the left of Figure 23.11 shows why a snapshot works correctly if the responsefrom v is sent following the receipt of the request at u. Let a0 and b be the state of nodes uand v respectively just before the response is sent. Let a and b0 be the state of nodes u andv respectively just after the response is delivered. This is sketched in Figure 23.11.323The snapshot protocol must guarantee that node u does not send any data packets to vduring a phase. Also v cannot send another data packet to u from the time the response issent until the response is delivered. This is because the link from v to u is a UDL that willnot give a free indication until the response is received.