Introduction to Algorithms. (2nd edition) Thomas Cormen_ Charles Leiserson_ Ronad Rivest (811417), страница 85
Текст из файла (страница 85)
The keys in nodex are used as dividing points separating the range of keys handled by x into n[x] + 1subranges, each handled by one child of x. When searching for a key in a B-tree, we make an(n[x] + 1)-way decision based on comparisons with the n[x] keys stored at node x. Thestructure of leaf nodes differs from that of internal nodes; we will examine these differencesin Section 18.1.Figure 18.1: A B-tree whose keys are the consonants of English. An internal node xcontaining n[x] keys has n[x] + 1 children.
All leaves are at the same depth in the tree. Thelightly shaded nodes are examined in a search for the letter R.Section 18.1 gives a precise definition of B-trees and proves that the height of a B-tree growsonly logarithmically with the number of nodes it contains. Section 18.2 describes how tosearch for a key and insert a key into a B-tree, and Section 18.3 discusses deletion.
Beforeproceeding, however, we need to ask why data structures designed to work on a magnetic diskare evaluated differently than data structures designed to work in main random-accessmemory.Data structures on secondary storageThere are many different technologies available for providing memory capacity in a computersystem. The primary memory (or main memory) of a computer system normally consists ofsilicon memory chips. This technology is typically two orders of magnitude more expensiveper bit stored than magnetic storage technology, such as tapes or disks. Most computersystems also have secondary storage based on magnetic disks; the amount of such secondarystorage often exceeds the amount of primary memory by at least two orders of magnitude.Figure 18.2(a) shows a typical disk drive.
The drive consists of several platters, which rotateat a constant speed around a common spindle. The surface of each platter is covered with amagnetizable material. Each platter is read or written by a head at the end of an arm. Thearms are physically attached, or "ganged" together,and they can move their heads toward oraway from the spindle. When a given head is stationary, the surface that passes underneath itis called a track. The read/write heads are vertically aligned at all times, and therefore the setof tracks underneath them are accessed simultaneously.
Figure 18.2(b) shows such a set oftracks, which is known as a cylinder.Figure 18.2: (a) A typical disk drive. It is composed of several platters that rotate around aspindle. Each platter is read and written with a head at the end of an arm.
The arms are gangedtogether so that they move their heads in unison. Here, the arms rotate around a commonpivot axis. A track is the surface that passes beneath the read/write head when it is stationary.(b) A cylinder consists of a set of covertical tracks.Although disks are cheaper and have higher capacity than main memory, they are much,much slower because they have moving parts. There are two components to the mechanicalmotion: platter rotation and arm movement. As of this writing, commodity disks rotate atspeeds of 5400–15,000 revolutions per minute (RPM), with 7200 RPM being the mostcommon.
Although 7200 RPM may seem fast, one rotation takes 8.33 milliseconds, which isalmost 5 orders of magnitude longer than the 100 nanosecond access times commonly foundfor silicon memory. In other words, if we have to wait a full rotation for a particular item tocome under the read/write head, we could access main memory almost 100,000 times duringthat span! On average we have to wait for only half a rotation, but still, the difference inaccess times for silicon memory vs. disks is enormous. Moving the arms also takes sometime.
As of this writing, average access times for commodity disks are in the range of 3 to 9milliseconds.In order to amortize the time spent waiting for mechanical movements, disks access not justone item but several at a time. Information is divided into a number of equal-sized pages ofbits that appear consecutively within cylinders, and each disk read or write is of one or moreentire pages. For a typical disk, a page might be 211 to 214 bytes in length. Once the read/writehead is positioned correctly and the disk has rotated to the beginning of the desired page,reading or writing a magnetic disk is entirely electronic (aside from the rotation of the disk),and large amounts of data can be read or written quickly.Often, it takes more time to access a page of information and read it from a disk than it takesfor the computer to examine all the information read. For this reason, in this chapter we shalllook separately at the two principal components of the running time:••the number of disk accesses, andthe CPU (computing) time.The number of disk accesses is measured in terms of the number of pages of information thatneed to be read from or written to the disk.
We note that disk access time is not constant—itdepends on the distance between the current track and the desired track and also on the initialrotational state of the disk. We shall nonetheless use the number of pages read or written as afirst-order approximation of the total time spent accessing the disk.In a typical B-tree application, the amount of data handled is so large that all the data do notfit into main memory at once. The B-tree algorithms copy selected pages from disk into mainmemory as needed and write back onto disk the pages that have changed.
B-tree algorithmsare designed so that only a constant number of pages are in main memory at any time; thus,the size of main memory does not limit the size of B-trees that can be handled.We model disk operations in our pseudocode as follows. Let x be a pointer to an object. If theobject is currently in the computer's main memory, then we can refer to the fields of the objectas usual: key[x], for example. If the object referred to by x resides on disk, however, then wemust perform the operation DISK-READ(x) to read object x into main memory before we canrefer to its fields. (We assume that if x is already in main memory, then DISK-READ(x)requires no disk accesses; it is a "no-op.") Similarly, the operation DISK-WRITE(x) is used tosave any changes that have been made to the fields of object x.
That is, the typical pattern forworking with an object is as follows:•••x ← a pointer to some objectDISK-READ(x)operations that access and/or modify the fields of x••DISK-WRITE(x) ▹ Omitted if no fields of x were changed.other operations that access but do not modify fields of xThe system can keep only a limited number of pages in main memory at any one time. Weshall assume that pages no longer in use are flushed from main memory by the system; our Btree algorithms will ignore this issue.Since in most systems the running time of a B-tree algorithm is determined mainly by thenumber of DISK-READ and DISK-WRITE operations it performs, it is sensible to use theseoperations efficiently by having them read or write as much information as possible. Thus, aB-tree node is usually as large as a whole disk page.
The number of children a B-tree nodecan have is therefore limited by the size of a disk page.For a large B-tree stored on a disk, branching factors between 50 and 2000 are often used,depending on the size of a key relative to the size of a page. A large branching factordramatically reduces both the height of the tree and the number of disk accesses required tofind any key. Figure 18.3 shows a B-tree with a branching factor of 1001 and height 2 that canstore over one billion keys; nevertheless, since the root node can be kept permanently in mainmemory, only two disk accesses at most are required to find any key in this tree!Figure 18.3: A B-tree of height 2 containing over one billion keys.
Each internal node and leafcontains 1000 keys. There are 1001 nodes at depth 1 and over one million leaves at depth 2.Shown inside each node x is n[x], the number of keys in x.18.1 Definition of B-treesTo keep things simple, we assume, as we have for binary search trees and red-black trees, thatany "satellite information" associated with a key is stored in the same node as the key. Inpractice, one might actually store with each key just a pointer to another disk page containingthe satellite information for that key. The pseudocode in this chapter implicitly assumes thatthe satellite information associated with a key, or the pointer to such satellite information,travels with the key whenever the key is moved from node to node.
A common variant on aB-tree, known as a B+- tree, stores all the satellite information in the leaves and stores onlykeys and child pointers in the internal nodes, thus maximizing the branching factor of theinternal nodes.A B-tree T is a rooted tree (whose root is root[T]) having the following properties:1. Every node x has the following fields:a. n[x], the number of keys currently stored in node x,b. the n[x] keys themselves, stored in nondecreasing order, so that key1[x] ≤key2[x] ≤ ··· ≤ keyn[x][x],c. leaf [x], a boolean value that is TRUE if x is a leaf and FALSE if x is aninternal node.2.