Hartl, Jones - Genetics. Principlers and analysis - 1998 (522927), страница 74
Текст из файла (страница 74)
The expression C0t is commonly called cot, and a plot of C/C0 versus C0t iscalled a cot curve. When renaturation is half completed, C/C0 = 1/2 andThe value of 1/k depends on experimental conditions, but for a particular set of conditions, the value is proportionalto the number of bases in the renaturing sequences.
The longer the sequence, the greater will be the time to achievehalf-complete renaturation for a particular strating concentration (because the number of molecules will besmaller). The equation just stated appleis to which we will return. If a molecule consists of several subsequences,then one needs to know C0 for each subsequence, and a set of values of 1/k will be obtained (one for each step inthe renaturation curve), each value depending on the length of the subsequence.
What is meant by the length of thesequence that determines the rate is bese described by example. A DNA molecule containing only adenine in onestrand (and thymine in the other) has a repeating length of 1. The repeating tetranucleotide . . . GACTGACT. . . has a repeating length of 4. A nonrepeating DNA molecule containing n nucleotide pairs has a unique length ofn.Experimentally, the number of bases per repeating unit is not determined directly. Generally, renaturation curvesfor a series of molecules of known molecular weight with no repeating elements in their sequences (and henceyielding one-step renaturation curves) serve as standards.
Molecules composed of short repeating sequences arealso occasionally used. A set of curves of this kind is shown in Figure 6.17. Note that two of these simple curvesrepresent the entire genomes of E. coli and phage T4. With the standard conditions for Cot analysis used to obtainthis set of curves, the sequence length N (in base pairs) that yields a particular value of C0t1/2 isin which t is in seconds, C0 is in nucleotides per liter, and 5 × 105 is a constant dependent on the conditions ofrenaturation. Through this formula, the experimentally determined value of Cot (C0t1/2) yields an estimate of therepeat length, N, of a repetitive sequence.
Note again that C0 is not the overall DNA concentration but theconcentration of the individual sequence producing a particular step in a curve. How one obtains the necessaryvalue of C0 will become clear when we analyze a Cot curve. Such an analysis begins by first noting the number ofsteps in the curve (each step of which represents a sequence or class of sequences of a particular length)Page 238and the fraction of the material represented by each step.
The observed value of C0t1/2 for each step must becorrected by first inferring the value of C0 for each sequence class. The lengths of the sequences are thendetermined from these corrected values by comparison to standards, and the sequence lengths and sequenceabundances, as a proportion of the total, are compared to obtain the number of copies of each sequence. Thisanalysis is best understood by looking at an example, which we will do next.Analysis of Genome Size and Repetitive Sequences by RenaturationFigure 6.18 shows a Cot curve typical of those obtained in analyses of the renaturation kinetics of eukaryoticgenomes. Three discrete steps are evident: 50 percent of the DNA has C0t1/2 = 103, 30 percent has C0t1/2 = 100 = 1,and 20 percent has C0t1/2 = 10-2.
The scale at the top of the figure was obtained from Cot analysis of molecules thathave unique sequences of known lengths, as in Figure 6.17. The sequence sizes cannot be determined directly fromthe observed C0t1/2 values, because each value of C0 used in plotting the horizontal axis in the body of the figure isthe total DNA concentration in the renaturation mixture. Multiplying each C0t1/2 value by the fraction of the totalDNA that it represents yields the necessary corrected C0t1/2 values (that is, 0.50 × 103, 0.30 × 1, and 0.20 × 10-2.From the size scale at the top of Figure 6.18, the corresponding sequence sizes (repeat lengths) are approximately3.0 × 108, 2.2 × 105, and 1 × 103 base pairs, respectively.To determine the number of copies of each sequence, we make use of the fact that the number of copies of asequence having a particular renaturation rate in inversely proportional to t1/2 and hence to the observed(uncorrected) C0t1/2 values for each class.
Thus if the haploid genome contains only one copy of the longestsequence (3.0 × 108 base pairs), it contains 103 copies of the sequence (or sequences) of length 2.2 × 105 base pairsand 105 copies of the sequence (or sequences) of length 1 × 103 base pairs. An estimate ofFigure 6.17A set of Cot curves for various DNA samples. The black arrows pointing up to the red scale indicate the numberof nucleotide pairs for each sample; they align with the intersection of each curve with the horizontal red line(the point of half-renaturation, or C0t1/2).
The y axis on this graph can be related to that on Figure 6.16by noting that maximum absorption represents totally single-stranded DNA and minimum absorptionrepresents totally double-stranded DNA.Page 239Figure 6.18The Cot curve analyzed in the text. The scale at the top is thesame as that in Figure 6.17. The black dashed lines indicate the fractionalcontribution of each class of molecules to the total DNA.the total number of base pairs per genome would in this case beThe different sequence components of a eukaryotic DNA molecule can be isolated by procedures that recover thedouble-stranded molecules formed at different times during a renaturation reaction. The method used is to allowrenaturation to proceed only to a particular C0t1/2 value. The reassociated molecules present at that point are thenseparated from the remaining single stranded molecules, usually by passing the solution through a tube filled with aform of calcium phosphate crystal (hydroxylapatite) that preferentially binds double-stranded DNA.6.7—Nucleotide Sequence Composition of Eukaryotic GenomesThe nucleotide sequence composition of many eukaryotic genomes has been examined via analysis of therenaturation kinetics of DNA.
The principal finding is that eukaryotic organisms differ widely in the proportion ofthe genome that consists of repetitive DNA sequences and in the types of repetitive sequences that are present. Inmost eukaryotic genomes, the DNA consists of three major components:• Unique, or single-copy, sequences This is usually the major component and typically comprises from 30 to 75percent of the chromosomal DNA in most organisms.• Highly repetitive sequences This component constitutes from 5 to 45 percent of the genome, depending on thespecies. Some of these sequences are the satellite DNA referred to earlier.
The sequences in this class are typicallyfrom 5 to 300 base pairs per repeat and are duplicated as many as 105 times per genome.• Middle-repetitive sequences This component constitutes from 1 to 30 percent of a eukaryotic genome andincludes sequences that are repeated from a few times to 105 times per genome.These different components can be identified not only by the kinetics of DNA reassociation but also by the numberof bands that appear in Southern blots with the use of appropriate probes and by other methods.
It should be clearfrom the preceding discussion of DNA reassociation that the dividing line between many middle-repetitivesequences is arbitrary.Page 240Unique SequencesMost gene sequences and the adjacent nucleotide sequences required for their expression are contained in theunique-sequence component. With minor exceptions (for example, the repetition of one or a few genes), thegenomes of viruses and prokaryotes are composed entirely of single-copy sequences; in contrast, such sequencesconstitute only 38 percent of the total genome in some sea urchin species, a little more than 50 percent of thehuman genome, and about 70 percent of the D. melanogaster genome.Highly Repetitive SequencesMany highly repetitive sequences are localized in blocks of tandem repeats, whereas others are dispersedthroughout the genome.
An example of the dispersed type is a family of related sequences in the human genomecalled the Alu family because the sequences contain a characteristic restriction site for the enzyme AluI (Section5.7). The Alu sequences are about 300 base pairs in length and are present in approximately 500,000 copies in thehuman genome; this repetitive DNA family alone accounts for about 5 percent of human DNA.Among the localized highly repetitive sequences, most are fairly short. Sequences of this type make up about 6percent of the human genome and 18 percent of the D. melanogaster genome, but they account for 45 percent ofthe DNA of D. virilis. One of the simplest possible repetitive sequences is composed of an alternating .
. . ATAT. . . sequence with about 3 percent G + C interspersed, which makes up 25 percent of the genomes of certainspecies of land crabs. In the D. virilis genome, the major components of the highly repetitive class are threedifferent but related sequences of seven base pairs rich in A–T:5'-ACAAACT-3'5'-ATAAACT-3'5'-ACAAATT-3'Blocks of satellite (highly repetitive) sequences in the genomes of several organisms have been located by in situhybridization with metaphase chromosomes (Figure 6.19). The satellite sequences located by this method havebeen found to be in the regions of the chromosomes called heterochromatin. These are regions that condenseearlier in prophase than the rest of the chromosome and are darkly stainable by many standard dyes used to makechromosomes visible (Figure 6.20); sometimesFigure 6.19Autoradiogram of metaphase chromosomes of the kangaroo rat Dipodomys ordii;radioactive RNA copied from purified satellite DNA sequenceshas been hybridized to the chromosomes to show the localization of the satelliteDNA.