Distributed Algorithms. Nancy A. Lynch (1993) (811416), страница 61
Текст из файла (страница 61)
Consider the same two processes A and B that wish to achieve mutual exclusion. This time, however, (see Figure 23.2,Part 2) A and B are at two dierent nodes of a computer network. The only way they cancommunicate is by sending token messages to each other. Thus we cannot use a single turnvariable that can be read by both processes. In fact, A must have at least two states: a statein which A has the token, and a state in which A does not have the token. B must also havetwo such states. Thus we need at least four combined states, of which two are illegal.Thus domain restriction at each node cannot prevent illegal combinations across nodes.We need other techniques to detect and correct illegal states of a network.
It should be nosurprise that the title of Dijkstra's pioneering paper on self-stabilization Dijkstra74] was\Self-Stabilization in spite of Distributed Control."300Figure 23.3: A typical mesh Network23.1.2 Self-Stabilization is attractive for NetworksWe will explore self-stabilization properties for computer networks and network protocols.A computer network consists of nodes that are interconnected by communication channels.The network topology (see Figure 23.3) is described by a graph. The vertices of the graphrepresent the nodes and the edges represent the channels. Nodes communicate with theirneighbors by sending messages along channels. Many real networks such as the ARPANET,DECNET and SNA can be modeled in this way.A network protocol consists of a program for each network node.
Each program consistsof code and inputs as well as local state. The global state of the network consists of thelocal state of each node as well as the messages on network links. We dene a catastrophicfault as a fault that arbitrarily corrupts the global network state, but not the program codeor the inputs from outside the network.Self-stabilization formalizes the following intuitive goal for networks: despite a historyof catastrophic failures, once catastrophic failures stop, the system should stabilize to correctbehavior without manual intervention. Thus self-stabilization is an abstraction of a strongfault-tolerance property for networks. It is an important property of real networks because: Catastrophic faults occur: Most network protocols are resilient to common failuressuch as nodes and link crashes but not to memory corruption. But memory corruptiondoes happen from time to time.
It is also hard to prevent a malfunctioning device fromsending out an incorrect message. Manual intervention has a high cost: In a large decentralized network, restoringthe network manually after a failure requires considerable coordination and expense.Thus even if catastrophic faults occur rarely, (say once a year) there is considerableincentive to make network protocols self-stabilizing. A reasonable guideline is that thenetwork should stabilize preferably before the user notices and at least before the userlogs a service call!301These issues are illustrated by the crash of the original ARPANET protocol (Rosen81]Perlman83]). The protocol was carefully designed never to enter a state that containedthree conicting updates a, b, and c. Unfortunately, a malfunctioning node injected threesuch updates into the network and crashed.
After this the network cycled continuouslybetween the three updates. It took days of detective work Rosen81] before the problem wasdiagnosed. With hindsight, the problem could have been avoided by making the protocolself-stabilizing.Self-stabilization is also attractive because a self-stabilizing program does not requireinitialization. The concept of an initial state makes perfect sense for a single sequentialprogram.
However, for a distributed program an initial state seems to be an articial concept.How was the distributed program placed in such an initial state? Did this require anotherdistributed program? Self-stabilization avoids these questions by eliminating the need fordistributed initialization.23.1.3 Criticisms of Self-StabilizationDespite the claims of the previous section, there are peculiar features of the self-stabilizationmodel that need justication. The model allows network state to be corrupted but not program code.
However,program code can be protected against arbitrary corruption of memory by redundancysince code is rarely modied. On the other hand, the state of a program is constantlybeing updated and it is not clear how one can prevent illegal operations on the memoryby using checksums. It is even harder to prevent a malfunctioning node from sendingout incorrect messages. The model only deals with catastrophic faults that stop.
Byzantine models deal withcontinuous faults. However, in Byzantine models, only a fraction of nodes are allowedto exhibit arbitrary behavior. In the self-stabilization model, all nodes are permittedto start with arbitrary initial states. Thus, the two models are orthogonal and caneven be combined.
A self-stabilizing program P is is allowed to make initial mistakes. However, theimportant stabilizing protocols that we know of are used for routing, scheduling, andresource allocation tasks. For such tasks, initial errors only result in a temporary lossof service.30223.2 Denitions of Stabilization23.2.1 Execution DenitionsAll the existing denitions of stabilization are in terms of the states and executions of asystem.
We will begin with a denition of stabilization that corresponds to the standarddenitions (for example, that of Katz and Perry KP90]). Next, we will describe anotherdenition of stabilization in terms of external behaviors. We believe that the denition ofbehavior stabilization is appropriate for large systems that require modular proofs. However,the denition of execution stabilization given below is essential in order to prove results aboutbehavior stabilization.Suppose we dene the correctness of an automaton in terms of a set C of legal executions.For example, for a token passing system, we can dene the legal executions to be those inwhich there is exactly one token in every state, and in which every process periodicallyreceives a token.What do we mean when we say that an automaton A stabilizes to the executions in setC .
Intuitively, we mean that eventually all executions of A begin to \look like" an executionin set C . For example, suppose C is the set of legal executions of a token passing system.Then in the initial state of A there may be zero or more tokens. However, the denitionrequires that eventually there is some sux of any execution of A in which there is exactlyone token in any state.Formally:Denition 1 Let C be a set of executions.
We say that automaton A stabilizes to theexecutions in C if for every execution of A there is some sux of execution that is inC.We can extend this denition in the natural way to dene what it means for an automatonA to stabilize to the executions of another automaton B .23.2.2 Denitions of Stabilization based on External BehaviorIn the I/O Automaton model, the correctness of an automaton is specied in terms of itsexternal behaviors. Thus we specify the correctness of a token passing system without anyreference to the state of the system. We can do so by specifying the ways in which tokendelivery and token return actions (to and from some external users of the token system) canbe interleavedThus it natural to look for a denition of stabilization in terms of external behaviors.
Wewould also hope that such a denition would allow us to modularly \compose" results about303the stabilization of parts of a system to yield stabilization results about the whole system.Now an IOA A is said to solve a problem P if the behaviors of A are contained in P . Forstabilization, however, it is reasonable to weaken this denition and ask only that an IOAeventually exhibit correct behavior.
Formally:Denition 2 Let P be a problem (i.e, a set of behaviors). An IOA A stabilizes to thebehaviors in P if for every behavior of A there is a sux of that is in P .Similarly, we can specify that A stabilizes to the behaviors of some other automaton Bin the same way. The behavior stabilization denition used in this section and V92] is aspecial case of a denition suggested by Nancy Lynch.Finally, we have a simple lemma that ties together the execution and behavior stabilization denitions.
It states that execution stabilization implies behavior stabilization. Infact, the only method we know to prove a behavior stabilization result is to rst prove acorresponding execution stabilization result, and then use this lemma. Thus the behaviorand execution stabilization denitions complement each other in this thesis: the former istypically used for specication and the latter is often used for proofs.Lemma 1 If IOA A stabilizes to the executions of IOA B then IOA A stabilizes to thebehaviors of B .23.2.3 Discussion on the Stabilization DenitionsFirst, notice that we have dened what it means for an arbitrary IOA to stabilize to sometarget set or automaton.