Real-Time Systems. Design Principles for Distributed Embedded Applications. Herman Kopetz. Second Edition (811374), страница 83
Текст из файла (страница 83)
If an assembly line must be stopped during the unavailabilityof-service interval, the cost of unavailability of service can be substantially higherthan the initial product acquisition cost and the repair cost of the failed component.Example: It is a goal of plant managers to reduce the probability for the need of on-calledmaintenance action as far as possible, ideally to zero.Example: In the airline industry, unscheduled maintenance of an airplane means lostconnections and extra cost for the lodging of passengers.Another aspect that influences the cost of maintenance relates to the questionwhether permanent hardware faults or software errors are considered. The repairof a permanent hardware fault requires the physical replacement of the brokencomponent, i.e., the spare part must be available at the site of failure and must beinstalled by a physical maintenance action.
Given an appropriate infrastructure hasbeen set up, the repair of a software fault can be performed remotely by downloading a new version of the software via the Internet with minimal or without anyhuman intervention.11.6.2 Maintenance StrategyThe design for maintenance starts with the specification of a maintenance strategyfor a product. The maintenance strategy will depend on the classification ofcomponents, on the maintainability/reliability/cost tradeoff of the product, andthe expected use of the product.Component Classification.
Two classes of components must be distinguished fromthe point of view of maintenance: components that exhibit wear-out failures andcomponents that exhibit spontaneous failures. For components that exhibit wearout failures, physical parameters must be identified that indicate the degree ofwear-out. These parameters must be continually monitored in order to periodicallyestablish the degree of wear-out and to determine whether a replacement of thecomponent must be considered during the next scheduled maintenance interval.Example: Monitoring the temperature or the vibration of a bearing can produce valuableinformation about the degree of wear-out of the bearing before it actually breaks down.28611 System DesignIn some manufacturing plants, more than 100,000 sensors are installed to monitor wear outparameters of diverse physical components.If it is not possible to identify a measureable wear-out parameter of a component orto measure such a parameter, another conservative technique of maintenance is thederating of components (i.e.
operating the components in a domain where there isminimal stress on the components), and the systematic replacement of componentsafter a given interval of deployment during a scheduled maintenance interval. Thistechnique is, however, quite expensive.For components with a spontaneous failure characteristic, such as many electronic components, it is not possible to estimate the interval that contains the instantof failure ahead of time. For these components the implementation of fault-tolerance, as discussed in Sect.
6.4, is the technique of choice to shift on-call maintenance to preventive maintenance.Maintainability/Reliability Tradeoff . This tradeoff determines the design of thefield-replaceable units (FRU) of a product. An FRU is a unit that can be replaced inthe field in case of failure. Ideally, an FRU consists of one or more FCUs (see Sect.6.1.1) in order that effective diagnosis of an FRU failure can be performed. The size(and cost) of an FRU (a spare part) is determined by a cost-analysis of a maintenance action on one side and the impact of the FCU structure on the reliability of theproduct on the other side. In order to reduce the time (and cost) of a repair action,the mechanical interfaces around an FRU should be easy to connect and disconnect.Mechanical interfaces that are easy to connect or disconnect (e.g., a plug) have asubstantially higher failure rate than interfaces that are firmly connected (e.g., asolder connection).
Thus, the introduction of FRU structure will normally decreasethe product reliability. The most reliable product is one that cannot be maintained.Many consumer products fall into this category, since they are designed for optimalreliability – if the product is broken, it must be replaced as whole by a new product.Expected Use. The expected use of a product determines whether a failure of theproduct will have serious consequences – such as the downtime of a large assemblyline.
In such a scenario it makes economic sense to implement a fault-tolerantelectronic system that masks a spontaneous permanent failure of an electronicdevice. At the next scheduled maintenance interval, the broken device can bereplaced, thus restoring the fault-tolerance capability. Hardware fault-tolerancethus transforms the expensive on-call maintenance action to a lower-cost scheduledmaintenance action.
The decreasing cost of electronic devices on one side and theincreasing labor cost and the cost of production loss during on-call maintenance onthe other side shift the break-even point for many electronic control systems towardsfault-tolerant systems.In an ambient intelligence environment, where smart Internet-enabled devicesare placed in many homes, the maintenance strategy must ensure that non-expertscan replace broken parts. This requires an elaborate diagnostic subsystem thatdiagnoses a fault to an FRU and orders the spare part autonomously via the Internet.If the spare part is delivered to the user, the inexperienced user must be capable toPoints to Remember287replace the spare part with minimal effort in order to restore the fault-toleranceof the system with minimum mental and physical effort.Example: The maintenance strategy of the Apple iPhone relies on the complete replacement of a broken hardware device, eliminating the need for setting up an elaboratehardware maintenance organization.
Software errors are corrected semi-automatically bydownloading a new version of the software from the Apple iTunes store.11.6.3 Software MaintenanceThe term software maintenance refers to all needed software activities to provide auseful service in a changing and evolving environment.
These activities include:llllCorrection of software errors. It is difficult to deliver error-free software. Ifdormant software errors are detected during operation in the field, the error mustbe corrected and a new software version must be delivered to the customer.Elimination of vulnerabilities. If a system is connected to the Internet, there is ahigh probability that any existing vulnerability will be detected by an intruderand used to attack and damage a system that would otherwise provide a reliableservice.Adaptation to evolving specifications.
A successful system changes its environment. The changed environment puts new demands on the system that must befulfilled in order to keep the system relevant for its users.Addition of new functions. Over time, new useful system functions will beidentified that should be included in a new version of the software.The connection of an embedded system to the Internet is a mixed blessing. On oneside, it makes it possible to provide Internet related services and to download a newversion of the software remotely, but on the other side it enables an adversary toexploit vulnerabilities of a system that would be irrelevant if no Internet connectionwere provided.Any embedded system that is connected to the Internet must support a securedownload service [Obm09].
This service is absolutely essential for the continuedremote maintenance of the software. The secure download must use strong cryptographic methods to ensure that an adversary cannot get control of the connectedhardware device and download a software of its liking.Example: A producer of modems sold 10,000 of modems all over the world beforehackers found out that the modems contained a vulnerability.
The producer did not considerto provide the infrastructure for a secure download service for installing a new correctedversion of the software remotely.Points to RememberlIn his recent book [Bro10], Fred Brook states that conceptual integrity of adesign is the result of a single mind.288lllllllllllll11 System DesignConstraints limit the design space and help the designer to avoid the explorationof design alternatives that are unrealistic in the given environment.
Constraintsare thus our friends, not our adversaries.Software per se is an action plan describing the operations of a real or virtualmachine. A plan by itself (without a machine) does not have any temporaldimension, cannot have state and has no behavior. This is one of the reasonswhy we consider the component and not the job as the primitive construct at thelevel of architecture design of an embedded system.Purpose analysis, i.e., the analysis why a new system is needed and what is theultimate goal of a design must precede the requirements analysis.The analysis and understanding of a large problem is never complete and thereare always good arguments for asking more questions concerning the requirements before starting with the real design work.
The paraphrase paralysis byanalysis has been coined to point out this danger.Model-based design is a design method that establishes a useful framework forthe development and integration of executable models of the controlled objectand of the controlling computer system.Component-based design is a meet-in-the middle design method. On the oneside, the functional and temporal requirements on the components are derivedtop-down from the desired application functions. On the other side, the functional and temporal capabilities of the components are contained in the specifications of the available components.Safety can be defined as the probability that a system will survive a given timespan without the occurrence of a critical failure mode that can lead to catastrophic consequences.Damage is a pecuniary measure for the loss in an accident, e.g., death, illness,injury, loss of property, or environmental harm.