Distributed Algorithms. Nancy A. Lynch (1993) (811416), страница 51
Текст из файла (страница 51)
For each such message it sends, it waits to obtain an acknowledgment along the254special channel. When it has received acks for all its messages, it is safe (using Awerbuch'sterminology { p. 807 of Awerbuch paper). For a node to be safe means that it knows thatall its neighbors have received its messages. Meanwhile, while waiting for its own messagesto be acknowledged, it is collecting and acknowledging the messages of its neighbors.When is it OK for the node front-end to deliver all the messages it has collected from itsneighbors to the client? Only when it knows that it won't receive any others.
It is thereforesucient to determine that all the node's neighbors are safe for that round (i.e., that thoseneighbors know that their messages for that round have all been delivered).Thus, the job of the synchronizer is to tell front-ends when all their neighbors are safe.To do this, the synchronizer has OK input actions, outputs from the front-ends, by which thefront ends tell the synchronizer that they are safe. The synchronizer S sends a GO messageto a front end when it has received an OK message from each of its neighbors.For now, we are just writing an abstract spec for the S component of course, this willhave to be implemented in the network.
We claim that this combination \simulates" thesynchronous system (combination of clients and synch-sys). Notice we did not say that it\implements" the system, in the formal sense of behavior inclusion. In fact, it doesn't thissimulation is local, in the sense that the rounds can be skewed at distances in the network.Rounds are only kept close for neighbors. But in this architecture, where the client programsdo not have any means of communicating directly with each other (outside of the synch-sys)this is enough to preserve the view of each client.Theorem 1 If is any execution of the implementation (clients, front ends, channels andsynchronizer) then there is an execution 0 of the specication (clients and synch-sys) suchthat for all clients i, jclient i = 0 jclient i .That is, as far as each client can tell, it's running in a synchronous system. So thebehavior is the same | the same correctness conditions hold at the outside boundary of theclients, subject to reordering at dierent nodes.Proof Sketch: We cannot do an implementation proof (as we did, for example, for theCTSS algorithm), since we are reordering the actions at dierent nodes.
Note that certainevents \depend on" other events to enable them, e.g., an OK event depends on prior ackinput events at the same front-end, and all events at one client automaton may dependon each other. (This is a conservative assumption.) We dene this depends on relation Dformally as follows. For every schedule of the implementation system, there is a partialorder relation D describing events occurring in that depend on each other. The keyproperty of this relation D is that for any schedule of the implementation system, andany permutation 0 of that preserves the partial order of events given by D , we havethat 0 is also a schedule of the implementation system.
This says that the D relations are255capturing enough about the dependencies in the schedule to ensure that any reordering thatpreserves these dependencies is still a valid schedule of the implementation system.Now, we start with any schedule of the implementation, and reorder the events to makethe successive rounds \line up" globally. The goal is to make the reordered schedule look asmuch as possible like a schedule of the synchronous system. We do this by explicitly puttingall the synch-receive(i) events after all the synch-send(i) events for the same i. We now claimthis new requirement is consistent with the dependency requirements in D | since thosenever require the reverse order, even when applied transitively.
We get 1 in this way. Bythe key claim above, 1 is also a schedule of the implementation system.We're not done yet | although 1 is fairly \synchronous", it is still a schedule of theimplementation system we want a schedule of the specication system. In the next step,we suppress the internal actions of the synch-sys, getting a \reduced" sequence 0. This stilllooks the same to the clients as , and now we claim that it is a schedule of the specicationsystem. This claim can be proved by induction.Note that the theorem started with an execution, not a schedule. To complete the proof,we extract the schedule , get 0 as above, and then ll in the states to get an execution ofthe specication system (lling in the client states as in ).Note that if we care about order of events at dierent clients, then this simulation strategydoes not work.20.2.3 Synchronizer ImplementationsNow we are left with the job of implementing the synchronizer part of this implementation.Its job is quite simple: it gets OK inputs from each node at each round, and for eachnode, it can output GO when it knows that all its neighbors in the graph (strictly speaking,including the node itself) have done input OK for that round.
We want to implement thisby a distributed algorithm, one piece per node, using a message system as usual. There areseveral ways to do this.Synchronizer . The synchronizer works as follows. When a node gets an OK, it sendsthis information to all its neighbors. When a node hears that all its neighbors are OK (andit is OK), it outputs GO. The correctness of this algorithm is fairly obvious.
The complexityis as follows. For every round, we generate O(E ) messages (since all nodes send messages onall edges at every round). Also, each round takes constant time (measuring time from whenall the OK's at a round have happened until all the GO's for that round have happened).We conclude that this algorithm is fast, but may be too communication-inecient forsome purposes. This brings us to the other extreme below.256Synchronizer . For this synchronizer, we assume that there is a rooted spanning tree inG. The rule is now as follows.
Convergecast all the OK info to the root, and then broadcastpermission to do the GO outputs. The cost of this algorithm is O(n) messages for everyround, but the time to complete a round is proportional to the height of the spanning tree(which is (n) in the worst case).20.2.4 Hybrid ImplementationBy combining synchronizer and , it is possible to get a hybrid algorithm that (dependingon the graph structure) might give (simultaneously) better time eciency than and bettermessage eciency than . The basic idea is to divide the graph up into clusters of nodes,each with its own spanning tree, i.e., a spanning forest.
(We assume that this spanningforest is constructed using some preprocessing.) The rule now will be to use synchronizer within each cluster-tree, and use to synchronize between clusters. To see how this works,we found it useful to describe a high-level decomposition (see Figure 20.1).front endsCluster−synchCluster−synchCluster−synchForest−synchFigure 20.1: Decomposition of the synchronization task to cluster-synch for intra-clustersynchronization and forest-synch for inter-cluster synchronization.The behavior of cluster-synch is to take OK's from all nodes in one particular cluster(which is a connected subset of nodes in the graph), and output a single \cluster-OK" toforest-synch. Then, when a single cluster-GO is input from forest-synch, it produces a GOfor each node in the cluster.
A possible way to implement this on the nodes of the givencluster, of course, is just to use the idea of synchronizer , doing convergecast to handle the257OKs and broadcast to handle the GOs, both on a spanning tree of the cluster.The other component is essentially (up to renaming) a synchronizer for the cluster graph0G of G, where the nodes of G0 correspond to the clusters of G, and there is an edge fromC to D in G0 if in G there is an edge from some node in C to some node in D. Ignoringfor the moment the problem of how to implement this forest-synchronizer in a distributedsystem based on G, let us rst argue why this decomposition is correct.
We need to showthat any GO that is output at any node i implies that OKs for i and all its neighbors inG have occurred. First consider node i. The GO(i) action means that cluster-GO for i'scluster C has occurred, which means that cluster-OK for C has occurred, which means thatOK(i) has occurred. Now consider j 2 neighbors G(i) \ C . Same argument holds: cluster-OKfor C means that OK(j ) has occurred. Finally, let j 2 neighbors G(i) ; C . The GO(i) actionmeans, as before, that cluster-GO for C has occurred, which implies that cluster-OK for Dhas occurred, where D is the cluster containing j . (The clusters are neighbors in G0 becausethe nodes i and j are neighbors in G.) This implies as before that OK(j ) has occurred.Implementation of forest-synch. We can use any synchronizer to synchronize between theclusters. Suppose we want to use synchronizer .
Note that we can't run directly, because itis supposed to run on nodes that correspond to the clusters, with edges directly connectingthese nodes (clusters) we aren't provided with such nodes and channels in a distributedsystem. However, it is not hard to simulate them. We do this as follows. Assume we have aleader node in each cluster, and let it run the node protocol of for the entire cluster. (Thiscan|but doesn't have to|be the same node as is used as the root in the implementationof for the cluster, if the cluster is synchronized using .) The next problem to solve is theway two leaders in dierent clusters can communicate. We simulate direct communicationusing a path between them.