Nash - Scientific Computing with PCs (523165), страница 24
Текст из файла (страница 24)
The idea is to move the data we aregoing to use to memory as in Figure 8.1.3, but now we do this at the system rather than application level.Most disk caching is done with hardware or with system software loaded with the operating system. Table8.4.1 shows the results of some timings with Microsoft’s SMARTDRV.SYS, which is a software cache.
Notethe significant speedup on write operations.Direct reduction of seek or latency time is more difficult, since it involves changes to low-level code withinthe operating system. However, data placement on the disk may help to reduce either seek or latencytime, if the workings of the particular drive are understood, and may have some important side benefits.Seek time may be reduced if the file to be read is stored as a contiguous block on the disk. A sector is thephysical unit of storage on the disk. The words "cluster", "block" or "bucket" are also used, possibly withspecial meanings.
We number sectors from 1 to N, where N is the total number of sectors available foruse on the storage device. Assuming our file is stored from sectors S to F inclusive, and each sector holdsC bytes of information, our file can be at most (F-S+1)*C bytes long.Unfortunately, files are often not stored this way. One of our earliest machines, a North Star Horizon, hadan operating system (North Star DOS) that forced files to be created as contiguous blocks.
This allowedfor very little data to be appended to the file. To overcome this deficiency, most operating systems allowfor extensions to files. These extensions need not be physically adjacent to each other. Typically, a file isinitially assigned a minimum number of sectors. When more space is needed, a standard sized block ofsectors is added. This is sometimes called an extent. The operating system handles the creation andadjustment of the file space. Moreover, it will reuse space freed by the deletion of files, and may evenavoid reuse of recently deleted space so as to improve the chances that the user can undelete (i.e., restore)accidentally erased files.The ongoing use of a disk eventually has the operating system hunting for space all over the disk; filesbecome fragmented.
This may delay data access. First, there is the directory management overhead — afile now has several entries that record where its parts are located and in which order they are to beaccessed. Second, the read/write head must now move, possibly across several tracks, to get betweenblocks of sectors — adding seek time for each extent.The remedy for such fragmentation is to defragment (or "defrag") or compact the disk. A variety ofcommercial and free software exists for such purposes.
Our experience suggests a considerable variationin features, reliability, ease of use, and performance. Defragmentation of a large disk may take quite a longtime — sometimes hours — especially with some slower programs.In our experience, the time costs of fragmentation are usually minor, but there is a security advantage tocontiguously stored files. If the directory of the disk is damaged, our data remains untouched butinaccessible. Many utility programs can read individual (physical) sectors of data. Indeed, the better onesallow us to search sectors for specific patterns of data.
Therefore we may be able to locate a sectorbelonging to our file. If it is stored contiguously, we can simply search forward or backward sequentiallyuntil we have found and resaved all our precious data.To find out the importance of disk fragmentation to data transfer times we wrote a small program tocreate a fragmented file and time the operations to do this. Starting with 2 text files, call them AAA andBBB, with BBB null, we repeatedly appended AAA to BBB, but in between each append operation, wecreated a new copy of AAA on the disk by copying AAA to F01, F02, F03, and so on.
The F?? files arethen between the extents of AAA. We timed the "append", the "copy" of AAA to F??, and then a "read"of the whole of BBB after each append and copy cycle. We did this for a 3.5" flexible disk, a fixed diskand a RAM disk on our 33MHz 80386/80387 machine, using a program compiled with Turbo BASIC 1.1and files of two different sizes. The timings were saved to a file and edited for analysis by the statisticalpackage Stata. To eliminate file size effects, we computed the operation time (optime) for a read, appendor copy of a single line by dividing timings by the file size in lines. Graphs show very minor slowdownwith increasing fragmentation except for flexible disk operations.
To check that there was no68Copyright © 1984, 1994 J C & M M NashSCIENTIFIC COMPUTING WITH PCsNash Information Services Inc., 1975 Bel Air Drive, Ottawa, ON K2C 0X1 CanadaCopy for:Dr. Dobb’s JournalTable 8.4.1 Impact of disk caching and disk compressions. The disk cache is Microsoft SMARTDRV.EXE(from Windows 3.1) dated 10/03/92. The disk compression is STACKER version 2.01 dated26/02/92. The test file consisted of 3147586 bytes in 100000 lines of text.
The cache size was1MB.write readcopy12.46 11.5425.1176.64 10.82266.01cache =======================speedup 14.180.9410.60detailsTower 486D66, nostack, cacheTower 486D66, nostack, nocache29.22 28.5156.25236.79 24.38479.44cache =======================speedup 8.100.868.52Notebook 386DX33: cache, nostackNotebook 386DX33: nocache, nostack25.05 24.9348.94231.3 23.95388.32cache =======================speedup 9.230.967.93Tower 386DX33, nostack, cacheTower 386DX33, nostack, nocacheStacked disk cache speedup.
Note how minor the speedup actually is.22.46 18.0196.3438.39 19.28717.82cache =======================speedup 1.711.077.45Notebook 386DX33: c, cache, stackNotebook 386DX33: c, nocache, stackStacked disk slowdown (NOTE: it turns out NOT to be slower!)Nocache236.79 24.38479.44Notebook 386DX33: nocache, nostack38.39 19.28717.82Notebook 386DX33: nocache, stackstack =======================slowdn0.160.791.50Cache29.22 28.5156.2522.46 18.0196.34stack =======================slowdn0.770.631.71Notebook 386DX33: cache, nostackNotebook 386DX33: cache, stack231.62 121.99552.3385.74 57.89787.41stack =======================slowdn0.370.471.43M286 12 nostack, nocacheM286 12 stack,nocache(There is insufficient memory toinstall a cache on this machine.)hidden effect we modeledoptime using a linear modeloptime = b1 fragments + b0(8.4.1)Table 8.4.2 shows the results for modelling read times.
We see that there is a pronounced effect for flexibledisk files. For fixed disk files we see from R-squared that the model has very poor predictive powerrelative to using a constant read time, but fragmentation is still a statistically significant variable in themodel.
For RAM disk, fragmentation does not seem to matter, as we would suspect because there is nophysical mechanism to explain a slowdown. Similar results were observed for append and copy times.As a final note on defragmentation, one should not attempt to interrupt the process, especially duringdirectory compaction. One of us (JN) started a defragmentation, realized there was a block of files thatshould be deleted first, and tried to halt the process by hitting the ESCape key.
While we have had notdifficulties in doing this when files are being moved around by the defragmenter (we use OPTune), herewe managed to cross-link — that is, confuse the directory of — 100 Megabytes of storage. What a mess!Fortunately, we had backup of all but one file, which was short and available on paper. Still, several hoursof cleanup were required.Another possible time saving may come from the placement of sectors on each track to reduce latencytime.
Here we need to know how the sectors are interleaved. That is, we need to know the ordering ofthe sectors along the track. For example, if there are ten sectors, they could be arranged in natural order123456789108: TIMING CONSIDERATIONS69However, since it is sometimes impossible to read one sector right after another, sectors may be ordered13579246810This gives the PC a chance (about half a disk revolution) to put sector 1 safely in memory before havingto read sector 2. Few users will want to engage in the complicated exercise of adjusting their data to savetime in this way.
Still, the knowledge of how a particular disk works will be important in explainingspecific types of performance of a program. Moreover, programs exist that allow for the changing of the"interleave factor" as well as to perform other disk maintenance tasks. An example is the programSpinRite.The cure for slow execution caused by disk transfer times may rest, not in changing software or dataplacement, but in replacement of the disk drive with a faster acting one. Replacement drives with highercapacity or performance than original equipment may be available and require no software changes.Another choice is to simulate the disk with a faster storage device.
Using software, the relatively slowphysical disk is simulated in fast memory. Data usually disappears when the power is turned off, thoughsome suppliers may use a battery to preserve "disk" contents. This approach to creating a fast "disk" isvariously called a semiconductor disk, pseudo-disk, RAM disk, or virtual disk.
We always configure ourmachines to have at least a small RAM disk for scratch files and fast text editing. Most RAM disks useof system memory, but manufacturers have from occasionally offered special boards or boxes. The virtuesof using RAM disk have already been shown in Figure 8.1.3 (b).8.5Time/Accuracy and Other Trade-off DecisionsFor floating-point calculations, we may measure deviations from precisely known values for answers. Wemay, of course, not need ultra-precise results, and can possibly save time by sacrificing accuracy.Iterative algorithms often produce successive approximations to the desired results, with terminationcontrolled by measuring the difference between successive iterates or similar values.