Distributed Algorithms. Nancy A. Lynch (1993) (811416), страница 68
Текст из файла (страница 68)
While these techniques cannot be used to solve every problem, thereare a large number of useful protocols that can be eciently stabilized using these notions.In the next lecture, Boaz will provide an example of a reset protocol that { somewhatsurprisingly - is locally correctable using the trivial partial order. In other places we have usedlocal checking to provide solutions for mutual exclusion, the end-to-end problem, stabilizingsynchronizers, and spanning tree protocols. More detailed references can be be found inAPV-91], V92] and AVarghese91.The messages used for local checking can be piggybacked on keepalive trac in realnetworks without any appreciable loss of eciency.
As a side benet, local checking alsoprovides a useful debugging tool. Any violations of local checking can be logged for furtherexamination. In a trial implementation of our reset procedure on the Autonet Auto90], localchecking discovered bugs in the protocol code. In the same vein, local checking can providea record of catastrophic, transient faults that are otherwise hard to detect.3276.852J/18.437J Distributed AlgorithmsLecturer: Boaz Patt-ShamirDecember 8, 1992Lecture 2424.1 Self-Stabilization and TerminationCan a self-stabilizing protocol terminate? This question has several possible answers.
First,consider the regular IO automata model, and the behavior stabilization denition. In thissetting, if the problem spec allows for nite behaviors, then the trivial automaton that doesnothing is a self-stabilizing solution. This is clearly unsatisfactory. We shall use a dierentformulation of the model to see that no \interesting" problem admits a terminating selfstabilizing protocol.Consider the input/output relation specication of problems, dened as follows.
In eachprocess, there are special input and output registers the problem is dened in terms of arelation that species for each input assignment what are the possible output assignments.14We also assume that some of the states are designated as \terminating", i.e., no action isenabled in them. This is just another formulation of a non-reactive distributed task. Wesaw quite a few such tasks, e.g., MST, shortest-paths, leader election, etc.In this setting, it is easy to see that a self-stabilizing protocol can have no terminatingstates, if it satises the following non-locality condition:There exist an input value v, an output value v0, a node j , and two input assignments I and I 0 such that v is assigned to j in both I and I 0, v0 is a possibleoutput value for I and is not a possible output value for I 0.Intuitively, the non-locality condition means that node j cannot tell locally that having v asinput and v0 as output is correct or wrong.
We remark that all interesting distributed taskshave this property: any other task amounts to a collection of unrelated processes.The moral of this obvious fact is that self-stabilizing algorithms must constantly verifywhether their output (or, more generally, their state) is \updated" with respect to the restof the system, and therefore they cannot terminate.This can be formalized in the IOA model by assuming that the input value is input to the systeminnitely often, and that the output action that carries the output value is continuously enabled.1432824.2 Self-Stabilization and Finite StateLet us consider an issue which may seem unrelated to self-stabilization at rst glance, namelyinnite-state protocols.
In particular, consider the idea of unbounded registers. We wentthrough a lot of diculty (conceptually) and paid a signicant price (in terms of performance)to get bounded-registers protocols (e.g., applying CTS to Lamport's bakery).This approach is easy to criticize from the practical viewpoint: to be concrete, supposethat the registers contain 64 bits. There is no system today that can hit the bound of 264in any practical application. However, note that this argument depends crucially on theassumption that the registers are initialized properly. For instance, consider the case where,as a result of a transient fault, the contents of the registers may be corrupted. Then nomatter how big we let the registers be, they may hit their bound very quickly.
On the otherhand, any feasible system can have only nitely many states, and in many cases it is desirableto be able to withstand transient faults.In short, unbounded registers are usually a convenient abstraction, given that the systemis initialized properly, while self-stabilizing protocols are useful only when they have nitelymany states. This observation gives rise to two possible approaches.
The rst is to developonly nite-state self-stabilizing protocols, tailored to the specic applications. The second isto have some kind of a general transformation that enables us to design abstract unboundedstate protocols, using real-world, bounded-size registers, and whenever the physical limit ishit, to reset the system.The main tool required for the second approach is a self-stabilizing reset protocol. Thiswill be the focus of our lecture today.24.3 Self-stabilizing Reset Protocol24.3.1 Problem statementInformally, the goal of the reset procedure is that whenever it is invoked, to create a fresh\version" of the protocol, which should be free of any possible corruption in the past.
Formally, we require the following.We are given a user at each node (we sometimes identify the user with its node). Theuser has input action Receive(m e) and output action Send(m e), where m is drawn fromsome message alphabet $, and e is an incident link.15 The meaning of these actions is thenatural one: m is received from e, and m is to be sent over e, respectively.
In addition, the15We distinguish between the actual link alphabet 0 and the user message alphabet . We assume that0 comprises of and the distinct Reset protocol messages.329reset requestreset signalreceive(m,e)ResetProtocolsendreceiveResetProtocolsend(m,e)reset requestreset signalreceive(m,e)send(m,e)UsersendreceivesendreceiveResetProtocolreset requestUserreset signalreceive(m,e)send(m,e)UserAsynchronousNetworkFigure 24.1: Interface specication for reset service.user also has output action reset request and input action reset signal. The problem is todesign a protocol (\reset service"), such that if one of the nodes makes a reset request andno node makes innitely many requests, then1.
In nite time all the nodes in the connected component of a requesting node receive areset signal.2. No node receives innitely many reset signals.3. Let e = (u v) be any link in the nal topology of the network. Then the sequenceof Send(m e) input at u after the last reset signal at u is identical to the sequence ofReceive e (m) output at v after the last reset signal at v.Informally, (1) and (2) guarantee that every node gets a last reset signal, and (3) stipulates that the last reset signal provides a consistent reference time-point for the nodes.In order that the consistency condition (3) will be satisable, it is assumed that all $messages from the user to the network and vice-versa are controlled by the reset protocol(see Figure 24.1).33024.3.2 Unbounded-registers ResetThe original problem requires us to run a new version of the user's protocol.
Consider thefollowing simple solution.1. The reset protocol maintains a \version number".2. All outgoing messages are stamped with the version number, and the numbers arestripped before delivery of received messages.3. Whenever a request occurs, version counter is incremented, and signal is output immediately (all messages will be stamped with new value).4.
Whenever a higher value is seen, it is adopted, and signal is output.5. Messages with low values are ignored.This protocol works ne (for the nodes participating in the protocol). Stabilization timeis O(diam ) which is clearly optimal. What makes this protocol uninteresting, as discussedabove, is the need of unbounded version counter.24.3.3 Example: Link ResetRecall the local correction theorem. The basic mechanism there was applying some \localcorrection" action which reset the nodes to some pre-specied state, and reset the links toempty.
This matches the above spec for reset. This link reset can be implemented fairlyeciently | the details are a bit messy, however. (It requires the use identiers at nodes,and a ushing mechanism based on either a time bound on message delivery, or a bound onthe maximal capacity of the link.)Our goal, in the reset protocol, is to provide a network-wide reset. We could do itusing similar ushing mechanisms (e.g., global timeout used in DECNET). However, globalbounds are usually very bad, as they need to accommodate simultaneous worst cases at allthe system. Thus we're interested in enhancing the self-stabilization properties of the links(as described by the local correction theorem) to all the network, without incurring the costof, say, a global timeout.24.3.4 Reset ProtocolLet us rst describe how the reset protocol should have worked (if it was initialized properly).In the so-called Ready mode, the protocol relays, using buers, $-messages between thenetwork and the user.
When a reset request is made at some node, its mode is changed to331RESET REQUESTATBROROA BORTABTORTBORTABARESET REQUEST(a)(b)READYACKRESET SIGNALACKACKREAREADYREADYACKACKDYREYADDYAKREACRESET SIGNAL(c)(d)Figure 24.2: An example of the run of the reset protocol. In (a), two nodes get reset requestfrom the user. In (b), the nodes in Abort mode are colored black.