Linux Device Drivers 2nd Edition (779877), страница 38
Текст из файла (страница 38)
For example, if sample_r ead calls sample_getdata, which in turn can block, then sample_r ead must be reentrant as well as sample_getdata, because nothing preventsanother process from calling it while it is already executing on behalf of a processthat went to sleep.Finally, of course, code that sleeps should always keep in mind that the state ofthe system can change in almost any way while a process is sleeping.
The drivershould be careful to check any aspect of its environment that might have changedwhile it wasn’t paying attention.Blocking and Nonblocking OperationsAnother point we need to touch on before we look at the implementation of fullfeatured read and write methods is the role of the O_NONBLOCK flag infilp->f_flags. The flag is defined in <linux/fcntl.h>, which is automatically included by <linux/fs.h>.The flag gets its name from ‘‘open-nonblock,’’ because it can be specified at opentime (and originally could only be specified there).
If you browse the source code,you’ll find some references to an O_NDELAY flag; this is an alternate name forO_NONBLOCK, accepted for compatibility with System V code. The flag is clearedby default, because the normal behavior of a process waiting for data is just tosleep. In the case of a blocking operation, which is the default, the followingbehavior should be implemented in order to adhere to the standard semantics:•If a process calls read but no data is (yet) available, the process must block.The process is awakened as soon as some data arrives, and that data isreturned to the caller, even if there is less than the amount requested in thecount argument to the method.•If a process calls write and there is no space in the buffer, the process mustblock, and it must be on a different wait queue from the one used for reading.When some data has been written to the hardware device, and space becomesfree in the output buffer, the process is awakened and the write call succeeds,although the data may be only partially written if there isn’t room in the bufferfor the count bytes that were requested.Both these statements assume that there are both input and output buffers; inpractice, almost every device driver has them.
The input buffer is required to avoidlosing data that arrives when nobody is reading. In contrast, data can’t be lost onwrite, because if the system call doesn’t accept data bytes, they remain in the userspace buffer. Even so, the output buffer is almost always useful for squeezingmore performance out of the hardware.14822 June 2001 16:36http://openlib.org.uaBlocking I/OThe performance gain of implementing an output buffer in the driver results fromthe reduced number of context switches and user-level/kernel-level transitions.Without an output buffer (assuming a slow device), only one or a few charactersare accepted by each system call, and while one process sleeps in write, anotherprocess runs (that’s one context switch).
When the first process is awakened, itresumes (another context switch), write returns (kernel/user transition), and theprocess reiterates the system call to write more data (user/kernel transition); thecall blocks, and the loop continues. If the output buffer is big enough, the writecall succeeds on the first attempt—the buffered data will be pushed out to thedevice later, at interrupt time—without control needing to go back to user spacefor a second or third write call. The choice of a suitable size for the output bufferis clearly device specific.We didn’t use an input buffer in scull, because data is already available when readis issued. Similarly, no output buffer was used, because data is simply copied tothe memory area associated with the device. Essentially, the device is a buffer, sothe implementation of additional buffers would be superfluous.
We’ll see the useof buffers in Chapter 9, in the section titled “Interrupt-Driven I/O.”The behavior of read and write is different if O_NONBLOCK is specified. In thiscase, the calls simply return -EAGAIN if a process calls read when no data isavailable or if it calls write when there’s no space in the buffer.As you might expect, nonblocking operations return immediately, allowing theapplication to poll for data.
Applications must be careful when using the stdiofunctions while dealing with nonblocking files, because they can easily mistake anonblocking return for EOF. They always have to check errno.Naturally, O_NONBLOCK is meaningful in the open method also. This happenswhen the call can actually block for a long time; for example, when opening aFIFO that has no writers (yet), or accessing a disk file with a pending lock. Usually, opening a device either succeeds or fails, without the need to wait for external events.
Sometimes, however, opening the device requires a long initialization,and you may choose to support O_NONBLOCK in your open method by returningimmediately with -EAGAIN (“try it again”) if the flag is set, after initiating deviceinitialization. The driver may also implement a blocking open to support accesspolicies in a way similar to file locks. We’ll see one such implementation in thesection “Blocking open as an Alternative to EBUSY” later in this chapter.Some drivers may also implement special semantics for O_NONBLOCK; for example, an open of a tape device usually blocks until a tape has been inserted. If thetape drive is opened with O_NONBLOCK, the open succeeds immediately regardless of whether the media is present or not.Only the read, write, and open file operations are affected by the nonblockingflag.14922 June 2001 16:36http://openlib.org.uaChapter 5: Enhanced Char Driver OperationsA Sample Implementation: scullpipeThe /dev/scullpipe devices (there are four of them by default) are part of the scullmodule and are used to show how blocking I/O is implemented.Within a driver, a process blocked in a read call is awakened when data arrives;usually the hardware issues an interrupt to signal such an event, and the driverawakens waiting processes as part of handling the interrupt.
The scull driverworks differently, so that it can be run without requiring any particular hardwareor an interrupt handler. We chose to use another process to generate the data andwake the reading process; similarly, reading processes are used to wake sleepingwriter processes. The resulting implementation is similar to that of a FIFO (ornamed pipe) filesystem node, whence the name.The device driver uses a device structure that embeds two wait queues and abuffer. The size of the buffer is configurable in the usual ways (at compile time,load time, or runtime).typedef struct Scull_Pipe {wait_queue_head_t inq, outq; /* read and write queues */char *buffer, *end;/* begin of buf, end of buf */int buffersize;/* used in pointer arithmetic */char *rp, *wp;/* where to read, where to write */int nreaders, nwriters;/* number of openings for r/w */struct fasync_struct *async_queue; /* asynchronous readers */struct semaphore sem;/* mutual exclusion semaphore */devfs_handle_t handle;/* only used if devfs is there */} Scull_Pipe;The read implementation manages both blocking and nonblocking input andlooks like this (the puzzling first line of the function is explained later, in “Seekinga Device”):ssize_t scull_p_read (struct file *filp, char *buf, size_t count,loff_t *f_pos){Scull_Pipe *dev = filp->private_data;if (f_pos != &filp->f_pos) return -ESPIPE;if (down_interruptible(&dev->sem))return -ERESTARTSYS;while (dev->rp == dev->wp) { /* nothing to read */up(&dev->sem); /* release the lock */if (filp->f_flags & O_NONBLOCK)return -EAGAIN;PDEBUG("\"%s\" reading: going to sleep\n", current->comm);if (wait_event_interruptible(dev->inq, (dev->rp != dev->wp)))return -ERESTARTSYS; /* signal: tell the fs layer to handle it *//* otherwise loop, but first reacquire the lock */if (down_interruptible(&dev->sem))15022 June 2001 16:36http://openlib.org.uaBlocking I/Oreturn -ERESTARTSYS;}/* ok, data is there, return something */if (dev->wp > dev->rp)count = min(count, dev->wp - dev->rp);else /* the write pointer has wrapped, return data up to dev->end */count = min(count, dev->end - dev->rp);if (copy_to_user(buf, dev->rp, count)) {up (&dev->sem);return -EFAULT;}dev->rp += count;if (dev->rp == dev->end)dev->rp = dev->buffer; /* wrapped */up (&dev->sem);/* finally, awaken any writers and return */wake_up_interruptible(&dev->outq);PDEBUG("\"%s\" did read %li bytes\n",current->comm, (long)count);return count;}As you can see, we left some PDEBUG statements in the code.