Distributed Algorithms. Nancy A. Lynch (1993) (811416), страница 60
Текст из файла (страница 60)
j receives the marker from i, records its local bank state ($0), puts a marker in thebuer to send to i, and snaps the state of the incoming channel as empty.5. i receives the marker, snaps the state of the channel as the sequence consisting of onemessage, the dollar it received before the marker.295Put together, the global snapshot obtained is one dollar at i, one dollar in channel fromj to i, and the other node and channel empty.
This looks reasonable in particular, thecorrect total ($2) is obtained. But note that this global state never actually arose during thecomputation. However, we can show a \reachability" relationship: Suppose that is theactual execution of the underlying bank algorithm. Then can be partitioned into 123,where 1 indicates the part before the snapshot starts and 3 indicates the part after thesnapshot ends.
Then the snapped state is reachable from the state at the end of 1, and thestate at the beginning of 3 is reachable from the snapped state. So the state \could havehappened", as far as anyone before or after the snapshot could tell.In terms of Lamport's partial order, consider the actual execution of the underlyingsystem depicted in Figure 22.12 (a). We can reorder it, while preserving the partial order,so that all the states recorded in the snapshot are aligned, as in Figure 22.12 (b).Process iProcess jProcess iProcess jlogicaltime−cuttime−cut(a)(b)Figure 22.12: (a) is the actual execution of Figure 22.11, and (b) is a possible orderings ofit, where the logical time cut of (a) is a real time cut.Applications revisitedStable property detection.A stable property is one that, if it is true of any state s of the underlying system, isalso true in all states reachable from s.
Thus, if it ever becomes true, it stays true. Thefollowing is a simple algorithm to detect the occurrence of a stable property P . First, thealgorithm does a global snapshot, and then either it collects the information somewhere andreturns P (result ), or else it does a distributed algorithm on the static information recordedfrom the snapshot, to determine P (result ). The correctness conditions of the snapshot (thereachability version of those conditions, that is) imply the following. If the output is true ,296then P is true in the real state of the underlying algorithm at the end of the snapshot, andif the output is false , then P is false in the real state of the underlying algorithm at thebeginning of the snapshot.
(There is some uncertainty about the interim.)Example: Termination detection. Assume a system of basic processes with no externalinputs, but the processes don't necessarily begin in quiescent states. This will execute on itsown for a while, and then might quiesce (terminate). More precisely, quiescence refers to allnodes being quiescent and no messages in transit anywhere.
We want to be able to detectthis situation, since it means that the system will perform no further action (since there areno inputs). Quiescence of global states is a stable property (assuming no inputs). So hereis a simple algorithm to detect it: do a snapshot, send the states somewhere and check ifanything is enabled or in transit. Actually, we don't need to send the whole state to oneprocess: we can have everyone check termination locally, and fan in (i.e., convergecast) theresults to some leader along a spanning tree. (We only need to convergecast a bit saying ifthe computation is done in the subtree.) It may be interesting to contrast this algorithmto the Dijkstra-Scholten termination detection: the snapshot algorithm involves the wholenetwork, i.e., it is not local at all. But on the other hand, if it only has to be executed once,then the overhead is bounded regardless of how long the underlying algorithm has gone onfor.Example: Deadlock detection.
Now the basic algorithm involves processes that are \waiting for" other processes, e.g., it can be determined from the state of process Ai that it iswaiting for process Aj (say to release a resource). The exact formulation varies in dierentsituations, e.g., a process might be waiting for resources owned by the other processes, etc.In all formulations, however, deadlock essentially amounts to a waiting cycle.
If we supposethat every process that is waiting is stopped and while it is stopped it will never \releaseresources" to allow anyone else to stop waiting, then deadlock is stable. Thus, we can detectit by taking snapshots, sending the information to one place, then doing an ordinary centralized cycle-detection algorithm. Alternatively, after taking the snapshot, we can performsome distributed algorithm to detect cycles, on the static data resulting from the snapshot.2976.852J/18.437J Distributed AlgorithmsLecturer: George VargheseDecember 3, 1992Lecture 23Designing network protocols is complicated by three issues: parallelism, asynchrony, andfault-tolerance.
This course has already covered techniques for making protocols robustagainst Byzantine faults. The Byzantine fault model is attractive because it is general { themodel allows a faulty process to continue to behave arbitrarily. In this lecture, we will studyanother general fault model { the self-stabilization model { which is attractive from both atheoretical and practical viewpoint.We will compare the two models a little later but for now let's just say that selfstabilization allows an arbitrary number of faults that stop while Byzantine models allow alimited number of faults that continue. Stabilizing solutions are also typically much cheaperand have found their way into real networks. In this lecture we will describe self-stabilizationusing two network models { an elegant shared memory model and a more practical networkmodel.
We will also describe three general techniques for making protocols stabilizing { localchecking and correction, counter ushing, and timer ushing.23.1 The Concept of Self-Stabilization23.1.1 Door Closing and Domain RestrictionToday we will focus on the ability of network protocols to stabilize to \correct behavior"after arbitrary initial perturbation. This property was called self-stabilization by DijkstraDijkstra74]. The \self" emphasizes the ability of the system to stabilize by itself withoutmanual intervention.A story illustrates the basic idea. Imagine that you live in a house in Alaska in themiddle of winter.
You establish the following protocol (set of rules) for people who enter andleave your house. Anybody who leaves or enters the house must shut the door after them. Ifthe door is initially shut, and nobody makes a mistake, then the door will eventually returnto the closed position. Suppose, however, that the door is initially open or that somebodyforgets to shut the door after they leave. Then the door will stay open until somebody passesthrough again. This can be a problem if heating bills are expensive and if several hours cango by before another person goes through the door. It is often a good idea to make the door298STEP 1: ENTERINGSTEP 2: MIDDLE OF THE DOORSTEP 3: OUT AT LAST!Figure 23.1: Exiting through a revolving door.closing protocol self-stabilizing. This can be done by adding a spring (or automatic doorcloser) that constantly restores the door to the closed position.In some cases, it is possible to trivialize the problem by hardwiring relationships betweenvariables to avoid illegal states { such a technique is actually used in a revolving door! Aperson enters the revolving door Figure 23.1, gets into the middle of the door, and nallyleaves.
It is physically impossible to leave the door open and yet there is a way to exitthrough the door.This technique, which we will call domain restriction, is often a simple but powerfulmethod for removing illegal states in computer systems that contain a single shared memory.Consider two processes A and B that have access to a common memory as shown in the rstpart of Figure 23.2. Suppose we want to implement mutual exclusion by passing a tokenbetween A and B .One way to implement this is to use two boolean variables tokenA and tokenB . To passthe token, Process A sets tokenA to false and sets tokenB to true.
In a self-stabilizingsetting, however, this is not a good implementation. For instance, the system will deadlockif tokenA = tokenB = false in the initial state. The problem is that we have some extra anduseless states. The natural solution is to restrict the domain to a single bit called turn, suchthat turn = 1 when A has the token and turn = 0 when B has the token. By using domainrestriction, 13 we ensure that any possible state is also a legal state.It is often feasible to use domain restriction to avoid illegal states within a single nodeof a computer network. Domain restriction can be implemented in many ways. The most13In this example, we are really changing the domain.
However, we prefer the term domain restriction.299tokenAtokenBturnPROCESSAtokenAPROCESSBtokenBPROCESSPROCESSAB1. WITH SHARED MEMORY2. WITHOUT SHARED MEMORYFigure 23.2: Token passing among two processesnatural way is by restricting the number of bits allocated to a set of variables so that everypossible value assigned to the bits corresponds to a legal assignment of values to each of thevariables in the set. Another possibility is to modify the code that reads variables so thatonly values within the specied domain are read.Unfortunately, domain restriction cannot solve all problems.