Real-Time Systems. Design Principles for Distributed Embedded Applications. Herman Kopetz. Second Edition (811374), страница 61
Текст из файла (страница 61)
The comparative analysis of memory models for chip multiprocessors comes to a similarconclusion [Lev08]. In many industrial embedded systems the high computation/communication ratio suggests that message passing is the preferred alternative.Example: In a premium car of today one can find up to one hundred Electronic Control Units(ECU) that are connected by a number of low-bandwidth CAN busses.
Aggregating some ofthese ECUs on a single die of an MPSoC will put very little load on the high-bandwidth NoC.Message passing has many other advantageous properties over shared memory,such as function encapsulation, fault containment, error containment, the support ofimplementation-agnostic design methods, and the support of power gating.8.3.3Power GatingIn a multi-core SoC, consisting of a set of heterogeneous IP-cores interconnected bya real-time network-on-chip, a well-defined application functionality can be implemented in a dedicated IP-core, i.e., a component (see Chap. 3).
Examples for suchan application functionality are resource management, security, MPEG processing,input–output controllers, external memory manager, etc. If the components interactwith each other via message passing only and do not access a shared memory, thenit is possible to encapsulate a component physically and logically in a small area ofsilicon that can be powered down when the services of the component are notneeded, thus saving the dynamic and static power of the component. Normally, thestate (see Sect.
4.2) that is contained in the component will be lost on power-down.It is therefore expedient to select a power/down power/up point when the state ofthe component is empty. Otherwise, the state of the component must be saved.There are two ways of saving the state, either by hardware techniques or by sendinga message containing the state to another component such that this other componentcan save and update the state.The hardware effort for saving the state transparently can be substantial [Kea07].On the other side, a distributed architecture that supports robustness must support thedynamic restart of components in case a component has failed due to a transient fault2068 Power and Energy Awareness(see Chap.
7). This restart must be performed with a temporally accurate componentstate. This software-based state restoration mechanism can be used to support thestate restoration required by power gating as well without any additional overhead.In an MPSoC architecture that consists of a plurality of components that areconnected by a network on chip (NoC), power gating is a very effective techniqueto save power. A component that is not in use can be shut down completely, thus notonly saving dynamic power but also the static power. Since static power is increasing substantially as we deploy below 100 nm technology, power gating becomes anextremely important power-saving technology.In many devices it is useful to distinguish between two major modes of operation:service mode and sleep mode.
In service mode, the full set of device services isprovided and the dynamic power dissipation is dominant. In sleep mode, only theminimal functionality for activating the device upon arrival of a wake-up signal isrequired. In sleep mode the static (leakage) power is of major concern. Power gatingcan be very effective in reducing the power demand in sleep mode. Alternatively, thesleep mode can be implemented in a completely different technology, e.g., in subthreshold logic, that starts up the service mode as soon as a relevant wake-up signal isrecognized. In this case, all components that are involved in the service mode can beshut down completely while in sleep mode, thus not consuming any power at all.8.3.4Real Time Versus Execution TimeIt is important to stress the fundamental difference between real-time and executiontime in a distributed real-time system.
There is no close relation between these two timebases (See also Sect. 4.1.3 on Temporal Control, which is related to real-time andLogical Control, which is related to execution time). In an MPSoC, the granularity ofthe real-time will be one or two orders of magnitude larger (and correspondingly, thefrequency lower) than the granularity of the execution time at the chip level. Since thepower consumption is proportional to the frequency – see Sect.
8.1.2 – the global realtime clock distribution network will only consume a small fraction of the power thatwould be needed for a global execution time clock distribution network. Establishing asingle global real-time base for the whole MPSoC, but many local asynchronousexecution time bases, one in each IP core of an MPSoC, can itself save a significantamount of energy and furthermore increase the energy savings potentials of a chip, asexplained in the following paragraphs.The real-time base makes the nodes of a distributed system aware of theprogression of real-time and provides the basis for the generation of temporalcontrol signals (see also Sect. 4.1.3). The local real-time clocks should be incremented according to the international standard of time TIA.
If no external clocksynchronization is available, real-time is established by a real-time reference clockthat forms the source of the distributed real-time base. The granularity of the globalreal-time depends on the precision of the clock synchronization and will bedifferent at different integration levels. For example, at the chip level, where theIP-cores of a SoC communicate via a NoC, the local real-time clocks in the IP-cores8.4 Software Techniques207will have a better precision (and consequently a smaller granularity) than the globalreal time at the device level, where devices communicate via a local area network(see Chap. 4 on clock synchronization).
At the chip level, the establishment of theglobal real-time can be realized by a stand-alone real-time clock distributionnetwork or it can be integrated into the NoC.The execution time base drives the algorithmic computations of the nodes andthus determines the speed of the computation (logical control – see Sect. 4.1.3). In alarge SoC, the energy dissipation of a global execution time clocking system of aSoC can form a large part of the chip’s energy consumption.
In addition, thelocation-dependent delay of the central high-frequency timing signals results in aclock skew that is difficult to control. Furthermore, the individual control of thevoltage and frequency of an IP-core is not possible if all IP-cores operate with thesame clock signal. It makes therefore sense to design each IP-core and the NoC asan island of synchronicity that generates its clocking signal for the execution timelocally. If the voltage of an IP-core can also be controlled locally, then the IP-core isan encapsulated subsystem with the capability for local voltage-frequency scalingand power gating. In the architecture model outlined in Chap.
4, clock-domaincrossing occurs in the message-interface between an IP-core and the NoC, whichmust be carefully designed in order to avoid meta-stability problems.8.4Software TechniquesThe equation E ¼ Ceff V2N of Sect. 8.1.2 gives the dynamic energy E required forthe execution of a program. There are three parameters in this equation, theeffective capacitance Ceff, the supply voltage V, and the number of instructions N.Reducing the effective capacitance Ceff and reducing the number of instructions pertask reduces the time needed to complete a computational task.
There is thus noinherent conflict at the software level between designing for energy-performanceand designing for time-performance.The voltage depends primarily on the hardware technology and can be controlledby software if dynamic voltage and frequency scaling is supported by the hardware.The effective capacitance Ceff can be reduced by spatial and temporal locality,particularly in the memory system. The number of instructions, the instructioncount N, needed to achieve the intended result depends entirely of the software.
Theinstruction count is the sum of the instructions executed by the system software andthe application software.8.4.1System SoftwareSystem software consists of the operating system and the middleware. The objectives of a flexible system-software infrastructure versus minimal energy consumption drive the design process in different directions. Many operating systems of thepast have considered flexibility as the key design driver, ignoring the topic of2088 Power and Energy Awarenessenergy-performance. In these systems, a long sequence of system-software instructions must be executed in order to finalize a single application command, such asthe sending of a message.In battery-operated embedded systems, the energy efficiency of the systemsoftware can be improved by off-line tailoring of the operating system functionsto the specific requirements of the given application.