Genome Project - Primer on molecular genetics - 1992 (522926), страница 5
Текст из файла (страница 5)
11.22Current Sequencing TechnologiesThe two basic sequencing approaches, Maxam-Gilbert and Sanger, differ primarily in theway the nested DNA fragments are produced. Both methods work because gel electrophoresis produces very high resolution separations of DNA molecules; even fragmentsthat differ in size by only a single nucleotide can be resolved. Almost all steps in thesesequencing methods are now automated. Maxam-Gilbert sequencing (also called thechemical degradation method) uses chemicals to cleave DNA at specific bases, resultingin fragments of different lengths. A refinement to the Maxam-Gilbert method known asmultiplex sequencing enables investigators to analyze about 40 clones on a single DNAsequencing gel. Sanger sequencing (also called the chain termination or dideoxy method)involves using an enzymatic procedure to synthesize DNA chains of varying length in fourdifferent reactions, stopping the DNA replication at positions occupied by one of the fourbases, and then determining the resulting fragment lengths (Fig.
12).These first-generation gel-based sequencing technologies are now beingused to sequence small regions of interest in the human genome. Althoughinvestigators could use existing technology to sequence whole chromosomes, time and cost considerations make large-scale sequencing projects ofthis nature impractical. The smallest human chromosome (Y) contains 50 Mb;the largest (chromosome 1) has 250 Mb.
The largest continuous DNAsequence obtained thus far, however, is approximately 350,000 bp, and thebest available equipment can sequence only 50,000 to 100,000 bases peryear at an approximate cost of $1 to $2 per base. At that rate, an unacceptable 30,000 work-years and at least $3 billion would be required for sequencing alone.ORNL-DWG 91M-173681. Sequencing reactions loadedonto polyacrylamide gel forfragment separationTCGAT C G AFig.
12. DNA Sequencing. Dideoxy sequencing (also called chain-termination orSanger method) uses an enzymatic procedure to synthesize DNA chains of varyinglengths, stopping DNA replication at one of the four bases and then determining theresulting fragment lengths. Each sequencing reaction tube (T, C, G, and A) in thediagram contains• a DNA template, a primer sequence, and a DNA polymerase to initiate synthesis of anew strand of DNA at the point where the primer is hybridized to the template;• the four deoxynucleotide triphosphates (dATP, dTTP, dCTP, and dGTP) to extendthe DNA strand;2.
Sequence read (bottom to top)from gel autoradiogramT C G A• one labeled deoxynucleotide triphosphate (using a radioactive element or dye); and• one dideoxynucleotide triphosphate, which terminates the growing chain wherever itis incorporated. Tube A has didATP, tube C has didCTP, etc.For example, in the A reaction tube the ratio of the dATP to didATP is adjusted so thateach tube will have a collection of DNA fragments with a didATP incorporated for eachadenine position on the template DNA fragments.
The fragments of varying length arethen separated by electrophoresis (1) and the positions of the nucleotides analyzed todetermine sequence. The fragments are separated on the basis of size, with the shorterfragments moving faster and appearing at the bottom of the gel. Sequence is read frombottom to top (2). (Source: see Fig. 11.)GTCGACTGCAAT23Primer onMolecularGeneticsSequencing Technologies Under DevelopmentA major focus of the Human Genome Project is the development of automated sequencing technology that can accurately sequence 100,000 or more bases per day at a cost ofless than $.50 per base. Specific goals include the development of sequencing anddetection schemes that are faster and more sensitive, accurate, and economical.
Manynovel sequencing technologies are now being explored, and the most promising ones willeventually be optimized for widespread use.Second-generation (interim) sequencing technologies will enable speed and accuracy toincrease by an order of magnitude (i.e., 10 times greater) while lowering the cost per base.Some important disease genes will be sequenced with such technologies as (1) highvoltage capillary and ultrathin electrophoresis to increase fragment separation rate and(2) use of resonance ionization spectroscopy to detect stable isotope labels.Third-generation gel-less sequencing technologies, which aim to increase efficiency byseveral orders of magnitude, are expected to be used for sequencing most of the humangenome.
These developing technologies include (1) enhanced fluorescence detectionof individual labeled bases in flow cytometry, (2) direct reading of the base sequenceon a DNA strand with the use of scanning tunneling or atomic force microscopies,(3) enhanced mass spectrometric analysis of DNA sequence, and (4) sequencing byhybridization to short panels of nucleotides of known sequence. Pilot large-scalesequencing projects will provide opportunities to improve current technologies and willreveal challenges investigators may encounter in larger-scale efforts.Partial Sequencing To Facilitate Mapping, GeneIdentificationCorrelating mapping data from different laboratories has been a problem because ofdifferences in generating, isolating, and mapping DNA fragments.
A common referencesystem designed to meet these challenges uses partially sequenced unique regions (200to 500 bp) to identify clones, contigs, and long stretches of sequence. Called sequencetagged sites (STSs), these short sequences have become standard markers for physicalmapping.Because coding sequences of genes represent most of the potentially useful informationcontent of the genome (but are only a fraction of the total DNA), some investigators havebegun partial sequencing of cDNAs instead of random genomic DNA. (cDNAs are derivedfrom mRNA sequences, which are the transcription products of expressed genes.) In addition to providing unique markers, these partial sequences [termed expressed sequencetags (ESTs)] also identify expressed genes.
This strategy can thus provide a means ofrapidly identifying most human genes. Other applications of the EST approach includedetermining locations of genes along chromosomes and identifying coding regions ingenomic sequences.24End Games: Completing Maps andSequences; Finding Specific GenesStarting maps and sequences is relatively simple; finishing them will require newstrategies or a combination of existing methods.
After a sequence is determined using themethods described above, the task remains to fill in the many large gaps left by currentmapping methods. One approach is single-chromosome microdissection, in which a pieceis physically cut from a chromosomal region of particular interest, broken up into smallerpieces, and amplified by PCR or cloning (see DNA Amplification). These fragments canthen be mapped and sequenced by the methods previously described.Chromosome walking, one strategy for filling in gaps, involves hybridizing a primer ofknown sequence to a clone from an unordered genomic library and synthesizing a shortcomplementary strand (called “walking” along a chromosome). The complementary strandis then sequenced and its end used as the next primer for further walking; in this way theadjacent, previously unknown, region is identified and sequenced.
The chromosome isthus systematically sequenced from one end to the other. Because primers must be synthesized chemically, a disadvantage of this technique is the large number of differentprimers needed to walk a long distance. Chromosome walking is also used to locatespecific genes by sequencing the chromosomal segments between markers that flank thegene of interest (Fig. 13).The current human genetic map has about 1000 markers, or 1 marker spaced every3 million bp; an estimated 100 genes lie between each pair of markers.
Higher-resolutiongenetic maps have been made in regions of particular interest. New genes can be locatedby combining genetic and physical map information for a region. The genetic map basically describes gene order. Rough information about gene location is sometimes availablealso, but these data must be used with caution because recombination is not equally likelyat all places on the chromosome. Thus the genetic map, compared to the physical map,stretches in some places and compresses in others, as though it were drawn on a rubberband.The degree of difficulty in finding a disease gene of interest depends largely on whatinformation is already known about the gene and, especially, on what kind of DNA alterations cause the disease.
Spotting the disease gene is very difficult when disease resultsfrom a single altered DNA base; sickle cell anemia is an example of such a case, as areprobably most major human inherited diseases. When disease results from a large DNArearrangement, this anomaly can usually be detected as alterations in the physical map ofthe region or even by direct microscopic examination of the chromosome. The location ofthese alterations pinpoints the site of the gene.Identifying the gene responsible for a specific disease without a map is analogous tofinding a needle in a haystack.