Wiley.Symbian.OS.Internals.Real.time.Kernel.Programming.Dec.2005.eBook-DDU (779891), страница 61
Текст из файла (страница 61)
Instead of allocating the data chunks for these processesin the normal data section the memory model allocates them inthe kernel section and they are never moved. The memory modelallocates an MMU domain, if possible, to provide protection for theprocess memory. The result is that a context switch to or from a fixedprocess does not require a D-cache flush and may even preservethe data TLB. One consequence of using this feature is that we canonly ever run a single instance of a fixed process, but this is quitea reasonable constraint for most of the server processes in the OS.Typical processes that we mark as fixed are the file server, commsserver, window server, font/bitmap server and database server. Whenthis attribute is used effectively in a device, it makes a notableimprovement to overall performance.Memory mapFigures 7.13 and 7.14 show how the virtual address space is divided inthe moving memory model.
These diagrams are not to scale and verylarge regions have been shortened, otherwise there would only be threeor four visible regions on it!7.4.1.4 AlgorithmsIn trying to understand how this memory model works it is usefulto walk through a couple of typical operations to see how they areimplemented.Process context switchThe memory model provides the thread scheduler with a callbackthat should be used whenever an address space switch is required.I will describe what happens when the scheduler invokes that callback.Switching the user-mode address space in the moving memory model isa complex operation, and can require a significant period of time – oftenmore than 100 microseconds. To reduce the impact on the real timebehavior of EKA2 of this slow operation, the address space switch iscarried out with preemption enabled.284MEMORY MODELS100000000fff00000exception vendorsROM - set from ROM scriptiRomlinearBaseRAM loaded user code - sizedepends on physical RAMiUserCodeBaseRAM loaded kernel code size depends on physical RAMiKernelCodeBasebeginning of kernel section calculated at ROM build timeiKernelLimitfixed processes - usually 2 or3MB each65000000kernel data and heap - setfrom ROM scriptiKernDataAddressprimary i/o mappings set upby the bootstrap63000000memory management see detailed map for this area60000000RAM Drive, if present40000000DLL static data - sizedepends on physical RAMiDllDataBasedata section, contains movingprocess data04000000unmapped, null pointer trap00000000Figure 7.13 Full memory map for moving memory modelTHE MEMORY MODELS28563000000unused62400000page tables (up to 4096 * 1K)62000000unused61280000612000006118000061100000alt dcache flush area6102000061000000page table info (4096 * 8bytes = 32K)page directory (16K)dcache flush areaunused60000000superpage / CPU pageFigure 7.14 Memory management detail for moving memory modelThe user-mode address space is a shared data object in the kernel,as more than one thread may wish to access the user-mode memoryof a different process, for example during IPC or device driver datatransfers.
Therefore, changing and using the user-mode address spacemust be protected by a mutex of some form – the moving memorymodel uses the system lock for this. This decision has a significantimpact on kernel-side software, and the memory model in particular – the system lock must be held whenever another process’s user-modememory is being accessed to ensure a consistent view of user-modememory.The context switch is such a long operation that holding the systemlock for the entire duration would have an impact on the real timebehavior of the OS, as kernel threads also need to acquire this lock totransfer data to and from user-mode memory.
We tackle this problemby regularly checking during the context switch to see if another threadis waiting on the system lock. If this is the case, the context switch isabandoned and the waiting thread is allowed to run. This leaves theuser-mode address space in a semi-consistent state: kernel software canlocate and manipulate any user-mode chunk as required, but when theuser-mode thread is scheduled again, more work will have to be done tocomplete the address space switch.286MEMORY MODELSThe fixed process optimization described in the previous section relieson the memory model keeping track of several processes.
It keeps arecord of the following processes:VariableDescriptionTheCurrentProcessThis is a kernel value that is really theowning process for the currentlyscheduled thread.TheCurrentVMProcessThis is the user-mode process that lastran. It ‘‘owns’’ the user-mode memorymap, and its memory is accessible.TheCurrentDataSectionProcessThis is the user-mode process that hasat least one moving chunk in thecommon address range – the datasection.TheCompleteDataSectionProcessThis is the user-mode process that hasall of its moving chunks in the datasection.Some of these values may be NULL as a result of an abandoned contextswitch, or termination of the process.The algorithm used by the process context switch is as follows:1.If the new process is fixed, then skip to step 62.If the new process is not TheCompleteDataSectionProcessthen flush the data cache as at least one chunk will have to be moved3.If a process other than the new one occupies the data section thenmove all of its chunks to the home section and protect them4.If a process other than the new one was the last user process thenprotect all of its chunks5.Move the new process’s chunks to the data section (if not alreadypresent) and unprotect them.
Go to step 86.[Fixed process] Protect the chunks of TheCurrentVMProcess7.Unprotect the chunks of the new process8.Flush the TLB if any chunks were moved or permissions changed.Thread request completeThis is the signaling mechanism at the heart of all inter-thread communications between user-mode programs and device drivers or servers.THE MEMORY MODELS287The part related to the memory model is the completion of the requeststatus, which is a 32-bit value in the requesting thread’s user memory.The signaling thread provides the address and the value to write there tothe DThread::RequestComplete() method, which is always calledwith the system lock held.In the moving memory model, this is a fairly simple operation becauseall of the user-mode memory is visible in the memory map, either in thedata section or in the home section. This function looks up the providedaddress in the chunks belonging to the process, and writes the data to theaddress where the memory is mapped now.7.4.2 The multiple modelThis memory model was developed primarily to support – andexploit – the new MMU developed for ARMv6.
However, it is moregenerally applicable than the moving memory model and can also beused with MMUs found on other popular processors such as Intel x86and Renesas SuperH.7.4.2.1 HardwareAs with the ARMv5 memory architecture, I refer you to the ARM Architecture Reference Manual for the full details of the level 1 memorysub-system on ARMv6.Virtual address mappingAs with ARMv5, the top-level page directory still contains 4096 entries.However, in contrast with ARMv5 the page directory on ARMv6 can besplit into two pieces.
Writing to an MMU control register, TTBCR, setsthe size of the first piece of the directory to contain the first 32, 64, . . .,2048 or 4096 page directory entries, with the remainder being located inthe second page directory. To support this, the MMU now has two TTBRregisters, TTBR0 and TTBR1.The MMU also has an 8-bit application space identifier register (ASID).If this is updated to contain a unique value for each process, and thememory is marked as being process-specific, then TLB entries createdfrom this mapping will include the ASID. As a result, we do not need toremove these TLB entries on a context switch – because the new processhas a different ASID and will not match the old process’s TLB entries.ProtectionAlthough ARMv6 still supports the concept of domains, this feature isnow deprecated on the assumption that operating systems will opt to usethe more powerful features of the new MMU.However, ARM have enhanced the page table permissions by theaddition of a never-execute bit.
When set, this prevents the page being288MEMORY MODELSaccessed as part of the instruction fetching. When used appropriately, thiscan prevent stack and heap memory being used to execute code, whichin turn makes it significantly harder to create effective security exploitssuch as buffer over-run attacks.CachesThe cache in ARMv6 has also been through a complete overhaul, and avirtually indexed, physically tagged cache replaces the virtually indexed,virtually tagged cache in ARMv5.The cache is indexed using the virtual address, which enables theevaluation of the set of cache lines that could contain the data to run inparallel with the address translation process (hopefully in the TLB). Oncethe physical address is available, this is used to identify the exact locationof the data in cache, if present.The result of using a physically tagged cache is very significant – theproblems associated with multiple mappings are effectively removed.When the same virtual address maps to different physical addresses(a homonym) the cache can still store both of these simultaneouslybecause the tags for the cache entries contain distinct physical addresses(see Figure 7.15).Also, two virtual addresses that map to the same physical address(a synonym) will both resolve to the same entry in the cache due to thephysical tag and so the coherency problem is also eliminated.
This rathernice result is not quite the whole picture – the use of the virtual addressas the index to the cache adds another twist for synonyms which I willdescribe more fully later.7.4.2.2 Memory model conceptThe features of the ARMv6 MMU enable a number of the drawbacks ofthe moving memory model to be eliminated without compromising onthe device constraints or OS requirements.The split page directory of ARMv6 allows us to revisit the commonidea of having one page directory for each process.
This time, instead ofrequiring 16 KB for each process, we can choose to have just a part ofthe overall page directory specific to each process and the rest can beused for global and kernel memory. EKA2 always uses the top half (2 GB)for the kernel and global mappings, and the bottom half for per-processmapping. This reduces the per-process overhead to a more acceptable8 KB, but retains up to 2 GB of virtual address space for each process.For devices with smaller amounts of RAM (<32 MB) we go further andonly map the bottom 1 GB for each process reducing the overhead tojust 4 KB for each process.
The name of the model comes from it usingmultiple page directories.The multiple memory model makes use of ASIDs to resolve the problemof mapping the same virtual address to different physical addresses, whileTHE MEMORY MODELS289the physically tagged cache ensures that multiple mappings of virtualor physical addresses can be correctly resolved without needing to flushdata out of the cache.