Linux Device Drivers 2nd Edition (779877), страница 89
Текст из файла (страница 89)
/pr oc/self is a special case of/pr oc/pid, because it always refers to the current process. As an example, here area couple of memory maps, to which we have added short comments after a sharpsign:morgana.root# cat08048000-0804e0000804e000-0805000008050000-0805400040000000-4001300040013000-4001400040014000-400150004001b000-4010800040108000-4010c0004010c000-40110000bfffe000-c0000000/proc/1/mapsr-xp 00000000rw-p 00005000rwxp 00000000r-xp 00000000rw-p 00012000rw-p 00000000r-xp 00000000rw-p 000ec000rw-p 00000000rwxp fffff000# look at init08:01 5129708:01 5129700:00 008:01 3900308:01 3900300:00 008:01 3900608:01 3900600:00 000:00 0/sbin/init # text/sbin/init # data# zero-mapped bss/lib/ld-2.1.3.so # text/lib/ld-2.1.3.so # data# bss for ld.so/lib/libc-2.1.3.so # text/lib/libc-2.1.3.so # data# bss for libc.so# zero-mapped stackmorgana.root# rsh wolf head /proc/self/maps #### alpha-axp: static000000011fffe000-0000000120000000 rwxp 0000000000000000 00:00 00000000120000000-0000000120014000 r-xp 0000000000000000 08:03 28440000000140000000-0000000140002000 rwxp 0000000000014000 08:03 28440000000140002000-0000000140008000 rwxp 0000000000000000 00:00 0ecoff# stack# text# data# bssThe fields in each line are as follows:start-end perm offset major:minor inode image.Each field in /pr oc/*/maps (except the image name) corresponds to a field instruct vm_area_struct, and is described in the following list.startendThe beginning and ending virtual addresses for this memory area.* The name BSS is a historical relic, from an old assembly operator meaning ‘‘Block startedby symbol.’’ The BSS segment of executable files isn’t stored on disk, and the kernelmaps the zero page to the BSS address range.37922 June 2001 16:42http://openlib.org.uaChapter 13: mmap and DMApermA bit mask with the memory area’s read, write, and execute permissions.
Thisfield describes what the process is allowed to do with pages belonging to thearea. The last character in the field is either p for ‘‘private’’ or s for ‘‘shared.’’offsetWhere the memory area begins in the file that it is mapped to. An offset ofzero, of course, means that the first page of the memory area corresponds tothe first page of the file.majorminorThe major and minor numbers of the device holding the file that has beenmapped.
Confusingly, for device mappings, the major and minor numbersrefer to the disk partition holding the device special file that was opened bythe user, and not the device itself.inodeThe inode number of the mapped file.imageThe name of the file (usually an executable image) that has been mapped.A driver that implements the mmap method needs to fill a VMA structure in theaddress space of the process mapping the device. The driver writer should therefore have at least a minimal understanding of VMAs in order to use them.Let’s look at the most important fields in struct vm_area_struct (defined in<linux/mm.h>). These fields may be used by device drivers in their mmapimplementation. Note that the kernel maintains lists and trees of VMAs to optimizearea lookup, and several fields of vm_area_struct are used to maintain thisorganization.
VMAs thus can’t be created at will by a driver, or the structures willbreak. The main fields of VMAs are as follows (note the similarity between thesefields and the /pr oc output we just saw):unsigned long vm_start;unsigned long vm_end;The virtual address range covered by this VMA. These fields are the first twofields shown in /pr oc/*/maps.struct file *vm_file;A pointer to the struct file structure associated with this area (if any).unsigned long vm_pgoff;The offset of the area in the file, in pages. When a file or device is mapped,this is the file position of the first page mapped in this area.38022 June 2001 16:42http://openlib.org.uaMemory Management in Linuxunsigned long vm_flags;A set of flags describing this area. The flags of the most interest to devicedriver writers are VM_IO and VM_RESERVED.
VM_IO marks a VMA as being amemory-mapped I/O region. Among other things, the VM_IO flag will preventthe region from being included in process core dumps. VM_RESERVED tellsthe memory management system not to attempt to swap out this VMA; itshould be set in most device mappings.struct vm_operations_struct *vm_ops;A set of functions that the kernel may invoke to operate on this memory area.Its presence indicates that the memory area is a kernel ‘‘object’’ like thestruct file we have been using throughout the book.void *vm_private_data;A field that may be used by the driver to store its own information.Like struct vm_area_struct, the vm_operations_struct is defined in<linux/mm.h>; it includes the operations listed next.
These operations are theonly ones needed to handle the process’s memory needs, and they are listed inthe order they are declared. Later in this chapter, some of these functions will beimplemented; they will be described more completely at that point.void (*open)(struct vm_area_struct *vma);The open method is called by the kernel to allow the subsystem implementingthe VMA to initialize the area, adjust reference counts, and so forth. Thismethod will be invoked any time that a new reference to the VMA is made(when a process forks, for example).
The one exception happens when theVMA is first created by mmap; in this case, the driver’s mmap method is calledinstead.void (*close)(struct vm_area_struct *vma);When an area is destroyed, the kernel calls its close operation. Note thatthere’s no usage count associated with VMAs; the area is opened and closedexactly once by each process that uses it.void (*unmap)(struct vm_area_struct *vma, unsigned longaddr, size_t len);The kernel calls this method to ‘‘unmap’’ part or all of an area. If the entirearea is unmapped, then the kernel calls vm_ops->close as soon asvm_ops->unmap returns.void (*protect)(struct vm_area_struct *vma, unsigned long,size_t, unsigned int newprot);This method is intended to change the protection on a memory area, but iscurrently not used.
Memory protection is handled by the page tables, and thekernel sets up the page-table entries separately.38122 June 2001 16:42http://openlib.org.uaChapter 13: mmap and DMAint (*sync)(struct vm_area_struct *vma, unsigned long,size_t, unsigned int flags);This method is called by the msync system call to save a dirty memory regionto the storage medium. The return value is expected to be 0 to indicate success and negative if there was an error.struct page *(*nopage)(struct vm_area_struct *vma, unsignedlong address, int write_access);When a process tries to access a page that belongs to a valid VMA, but that iscurrently not in memory, the nopage method is called (if it is defined) for therelated area. The method returns the struct page pointer for the physicalpage, after, perhaps, having read it in from secondary storage.
If the nopagemethod isn’t defined for the area, an empty page is allocated by the kernel.The third argument, write_access, counts as ‘‘no-share’’: a nonzero valuemeans the page must be owned by the current process, whereas 0 means thatsharing is possible.struct page *(*wppage)(struct vm_area_struct *vma, unsignedlong address, struct page *page);This method handles write-protected page faults but is currently unused.
Thekernel handles attempts to write over a protected page without invoking thearea-specific callback. Write-protect faults are used to implement copy-onwrite. A private page can be shared across processes until one process writesto it. When that happens, the page is cloned, and the process writes on itsown copy of the page. If the whole area is marked as read-only, a SIGSEGVis sent to the process, and the copy-on-write is not performed.int (*swapout)(struct page *page, struct file *file);This method is called when a page is selected to be swapped out. A returnvalue of 0 signals success; any other value signals an error. In case of error,the process owning the page is sent a SIGBUS.
It is highly unlikely that adriver will ever need to implement swapout; device mappings are not something that the kernel can just write to disk.That concludes our overview of Linux memory management data structures. Withthat out of the way, we can now proceed to the implementation of the mmap system call.The mmap Device OperationMemory mapping is one of the most interesting features of modern Unix systems.As far as drivers are concerned, memory mapping can be used to provide userprograms with direct access to device memory.A definitive example of mmap usage can be seen by looking at a subset of the virtual memory areas for the X Window System server:38222 June 2001 16:42http://openlib.org.uaThe mmap Device Operationcat /proc/731/maps08048000-08327000 r-xp08327000-08369000 rw-p40015000-40019000 rw-s40131000-40141000 rw-s40141000-40941000 rw-s...00000000002de000fe2fc000000a0000f400000008:0108:0108:0108:0108:015550555505107781077810778/usr/X11R6/bin/XF86_SVGA/usr/X11R6/bin/XF86_SVGA/dev/mem/dev/mem/dev/memThe full list of the X server’s VMAs is lengthy, but most of the entries are not ofinterest here.
We do see, however, three separate mappings of /dev/mem, whichgive some insight into how the X server works with the video card. The first mapping shows a 16 KB region mapped at fe2fc000. This address is far above thehighest RAM address on the system; it is, instead, a region of memory on a PCIperipheral (the video card). It will be a control region for that card. The middlemapping is at a0000, which is the standard location for video RAM in the 640 KBISA hole. The last /dev/mem mapping is a rather larger one at f4000000 and isthe video memory itself. These regions can also be seen in /pr oc/iomem:000a0000-000bffff : Video RAM areaf4000000-f4ffffff : Matrox Graphics, Inc. MGA G200 AGPfe2fc000-fe2fffff : Matrox Graphics, Inc. MGA G200 AGPMapping a device means associating a range of user-space addresses to devicememory.
Whenever the program reads or writes in the assigned address range, itis actually accessing the device. In the X server example, using mmap allowsquick and easy access to the video card’s memory. For a performance-criticalapplication like this, direct access makes a large difference.As you might suspect, not every device lends itself to the mmap abstraction; itmakes no sense, for instance, for serial ports and other stream-oriented devices.Another limitation of mmap is that mapping is PAGE_SIZE grained.