Real-Time Systems. Design Principles for Distributed Embedded Applications. Herman Kopetz. Second Edition (811374), страница 60
Текст из файла (страница 60)
In many highly integrated chips, a significant fraction of the total poweris consumed in the distribution network of the execution clock. One way to reduce powerin the execution clock network is to turn execution clocks off when and where they are notneeded. Clock gating can reduce the power consumption of chips by 20% or more.Transistor Sizing and Circuit Design.
Energy can be saved if the transistor and circuitdesign is optimized with the goal to save power instead with the goal to get the optimalspeed. Special sizing of transistors can help to reduce the capacitance that must be switched.Multi-threshold Logic. With present day micro-electronic design tools it is possiblebuild transistors with different threshold voltages on the same die.
High-thresholdtransistors have a lower leakage current but are slower than low-threshold transistors. It is thus possible to save dynamic and static power by properly combiningthese two types of transistors.2028.2.38 Power and Energy AwarenessVoltage and Frequency ScalingIt has been observed that, within limits characteristic for each technology, there is anearly linear dependency of frequency on voltage.
If the frequency for the operationof the device is reduced, the voltage can be reduced as well without disturbing thefunctionality of the device [Kea07]. Since the power consumption grows linearlywith frequency, but with the square of the voltage, combined voltage and frequencyscaling causes not only a reduction of power, but also a reduction of energy requiredto perform a computation.Example: The Intel XScale® processor can dynamically operate over the voltage range of0.7–1.75 V and at a frequency range of 150–800 MHz. The highest energy consumption is6.3 times the lowest energy consumption.Voltage scaling can be performed in the interval hVthreshold, Vnormali.
Since Vnormalis reduced as a device is scaled to lower dimensions (see Sect. 8.2.1), the range thatis available for voltage scaling in sub-micron devices is reduced and voltage scalingbecomes less effective.The additional circuitry that is needed to perform software-controlled dynamicvoltage and frequency scaling is substantial. In order to reduce this circuitry, somedesigns support only two operating modes: a high-performance operating mode thatmaximizes performance and an energy-efficient operating mode that maximizes energyefficiency. The switchover between these two modes can be controlled by software.
Forexample, a laptop can run in the high-performance operating mode if it is connected tothe power grid and in the energy-efficient operating mode if it runs on battery power.Given that the hardware supports voltage and frequency scaling, the operatingsystem can integrate power management with real-time scheduling of time-criticaltasks to optimize the overall energy consumption. If the Worst-Case Execution Time(the WCET) of a task is known on a processor running at a given frequency and the taskhas some slack until it must finish, then the frequency and voltage can be reduced to letthe task complete just in time and save energy.
This integrated real-time and powermanagement scheduling has to be supported at the level of the operating system.8.2.4Sub-threshold LogicThere is an increasing number of applications where ultra-low power consumptionwith reduced computational demands is desired. Take the example of the billionsof standby circuits in electronic devices (e.g., television sets) that are continuouslydraining power while waiting for a significant event to occur (e.g., a start commandfrom a remote console or a significant event in a sensor network).
The technique ofsub-threshold logic uses the (normally unwanted) sub-threshold leakage current ofa sub-micron device to encode logic functionality. This novel technique has thepotential to design low time-performance devices with a very low power requirement [Soe01].8.3 System Architecture8.3203System ArchitectureNext to device scaling, the following system architecture techniques are mosteffective in reducing the energy requirement significantly.8.3.1Technology-Agnostic DesignAt a high level of abstraction, an application requirement can be expressed by aplatform-independent model (PIM) (see also Sect.
4.4). A PIM describes thefunctional and temporal properties of the requested solution without making anyreference to the concrete hardware implementation. For example, when we specifythe functionality and timing of the braking system of a car, we demand that theproper braking action will start within 2 ms after stepping on the brake pedal.
Wesay that such a high-level description of an application is technology agnostic.A PIM can be expressed in a procedural language, e.g., System C, augmented by therequired timing information, e.g., by UML MARTE [OMG08]. The system implementer has then the freedom to select the implementation technology that is mostappropriate for his/her purpose.In a second step, the PIM must be transformed into a representation that can beexecuted on the selected target hardware, resulting in the platform-specific model(PSM).
The target hardware can be either a specific CPU with memory, a FieldProgrammable Gate Array (FPGA), or a dedicated Application Specific IntegratedCircuit (ASIC). Although the functional and temporal requirements of the PIM aresatisfied by all of these implementation choices, they differ significantly in theirnon-functional properties, such as energy requirements, silicon real-estate, orreliability. Figure 8.2 gives a gross indication of the energy required to execute agiven computation in the three mentioned technologies.
CPU-based computationshave a built-in power overhead for instruction fetch and decoding that is not presentin hardwired logic.Gops/W1000ASICFPGA10010CPU1cell0.10.0119901995200020052010Fig. 8.2 Power requirement of different implementation technologies (Adapted from [Lau06,slide 7])2048 Power and Energy AwarenessThe technology-agnostic design makes it possible to change the target hardware ofa single component, e.g., replace a CPU-based component by an ASIC without havingto revalidate the complete system.
Such an implementation flexibility is of particularimportance for battery-operated mass-market devices where the initial test version ofthe functionality of a component can be realized and tested on a CPU-based implementation and later transferred to an ASIC for the mass-market production.The technology-agnostic design makes it possible to address the technologyobsolescence problem as well. In long-lived applications, such as the control systemon an airplane, the services of the control system must be provided for a long timespan, e.g., 50 years. During this time-span, the original hardware technologybecomes outdated. Technology-agnostic design makes it possible to change thehardware and the related transformation of the PIM to the PSM, without having tochange the interfaces to the other subsystems.8.3.2Pollack’s RuleOver the past 20 years we have seen a tremendous performance increase of singleprocessor systems.
New architectural mechanisms, such as pipelining, out-of-orderexecution, speculative branching, and many levels of caching have made it possible tosignificantly reduce the execution time needed for a sequential program without havingto invest in alternative system and software architectures that support a highly parallelexecution environment.
However, this performance increase of the sequential processor has its (energy) price. Fred Pollack from Intel looked at the integer performanceincrease of a new micro-architecture against area and power of the previous microarchitecture, implemented in the same process technology [Bor07]. Pollack found, thatover a number of Intel architectures, starting with the i386 in 1986, the performance ofevery subsequent micro-architecture increased only with the square root of the poweror silicon area. This relationship is normally referred to as Pollack’s Rule.Embedded systems are characterized by an application-inherent parallelism,i.e., they consist of many concurrent, nearly independent processes.
In order toestablish a viable software execution environment for these nearly independentparallel processes, a complex operating system that provides spatial and temporalpartitioning must be implemented on top of a sequential processor. From the energyperspective, this is yet another setback. At first, energy is wasted by the execution ofthe powerful sequential machine, and then energy is wasted again to provide theencapsulated parallel execution environments for the support of the parallel processes running on this sequential machine.Example: According to Pollack’s rule, the speed improvement of an IP-core achieved byadvanced micro-architectural mechanisms scales by the square root of two, while therequired energy and the silicon area increase by factor of 2. After 4 generations of microarchitecture evolutions, an IP-core would have grown 16 times its original size, wouldconsume 16 times as much energy as the original, and achieve a time-performance improvement of four.
The micro-architecture evolution has degraded the energy efficiency by 400%.8.3 System Architecture205The recent introduction of multi-core systems on chip (MPSoC), where simpleIP-cores are connected by a network-on-chip (NoC), will thus revolutionize theexecution environment for embedded systems. Considering the size of the embedded system market, it can be expected that in the future energy-efficient multi-coresystems that focus on this market will become dominant.
The potential for energysavings of these systems is significant.An important issue in the design of MPSoCs is related to the structure ofthe interconnect among the IP-cores. There are basically two alternatives (1) amessage-based communication infrastructure and (2) the provision of a large sharedmemory. Poletti et al. [Pol07] have investigated the energy efficiency of these twoalternatives and come to the conclusion that message-based systems are preferableif there is a high computation/communication ratio while shared memory outperforms message passing if the computation/communication ratio is low.