Van Eyk, Dunn - Proteomic and Genomic Analysis of Cardiovascular Disease - 2003 (522919), страница 6
Текст из файла (страница 6)
You run the risk of identifying differences in expression that are secondary to the continuous overexpression of thetransgene rather than being directly connected to the gene of interest.These problems might be addressed by using inducible systems, for example inducible promoters as the Tet-on and Tet-off systems [14] to regulate transgene expression, together with carefully designed time courses. Experimental animalmodels for cardiovascular diseases should be analyzed at various time points to account for the onset, progression, and chronic phase of the disease.
Integrating thegene expression profiles of different models with related phenotypic pathology, forexample comparing mouse models of cardiac hypertrophy induced by transgenicoverexpression of hypertrophic stimuli with models of pressure-overload inducedhypertrophy, will help to further discriminate between primary causes and secondary changes associated with the disease that is being investigated. Currently,cross-species and cross-platform comparisons are hindered by the fact that individual microarray platforms interrogate different sets of genes, and by the lack ofstandardized RNA processing and labeling procedures, appropriate software andannotation tools, and the use of a common reference RNA that would allow forcross-experimental normalization (see 2.4. Data Sharing).
Currently, only experiments performed by individual laboratories on the same microarray platform caneasily be integrated, but only few laboratories have the funding needed to performsuch large scale expression profiling studies. To address this problem, the National Heart, Lung, and Blood Institute (NHLBI) launched the “Programs ofGenomic Applications (PGA)” initiative. Eleven PGA’s were funded to developgenomic resources including microarray and SNP data, animal models, clone collections, and software tools for the scientific community in order to advance scientific research related to heart, lung, blood, and sleep disorders.
PGA resources areaccessible through the NHLBI and on the CardioGenomics website (Tab. 1.1).1.2 Computational Analysis of Microarray Data1.2Computational Analysis of Microarray Data1.2.1Raw Data AnalysisMost commercially available scanner manufacturers provide software to processthe scanned raw image and to transform the fluorescent intensity pixels into ameasure for gene expression, and there are additional image-processing softwaretools available for cDNA microarrays (Tab. 1.1). In contrast, Affymetrix MicroarraySuite (MAS) is the only software to analyze high-density oligonucleotide arrays,with the exception of dCHIP [15], an analysis tool developed by the Wong laboratory at the Harvard School of Public Health that has not found widespread acceptance yet. MAS calculates the expression value based on the intensity of the PM/MM probe pairs (see 1.1 and Fig.
1.2), using the MM oligonucleotide hybridization to access background noise within the PM signal. This concept poses specificproblems and will therefore be discussed in greater detail in this section.MAS calculates a numerical value called the average difference (AvgDiff) for expression intensities of the transcripts asAvgDiff P PM-MM= Pairs in Avg :For this purpose, the intensity of the MM oligonucleotide is subtracted from theintensity of the PM oligonucleotide and averaged across the “Pairs in Avg”, whichis the number of probe pairs for which the intensity differences (PM-MM) arewithin the range of three standard deviations (by default). Thus, in cases wherethe MM probe has a higher intensity value than the PM, the AvgDiff has a negative value. However, in a biological sense, transcripts can only be absent and havea value of zero or be present and have a positive value.
Negative AvgDiff valuesare frequently observed for transcripts of low abundance.The design intent behind the MM feature is to quantify background (scannernoise, etc.) and non-specific interaction resulting from cross-hybridization withinthe PM signal, and thus to provide a more reliable measure of the signal attributable to specific probe-target interactions. However, it has been shown that 66–80%of the MM signal is derived from probe-target interactions rather than from random binding [16]. Therefore, the AvgDiff value will under-represent the actualmRNA concentration.In the recently released MAS 5.0 software, Affymetrix has changed the algorithmfrom empirical to statistical and has adapted its terminology to fit more standardterms.
The AvgDiff that had been used for empirical expression analysis was changed to the “Signal”. The signal is calculated using the One-Step Tukey’s Biweight Estimate, which yields a robust weighted mean that is relatively insensitive to outliers.The Tukey’s Biweight method gives an estimate of the amount of variation in thedata, exactly as standard deviation measures the amount of variation for an average. Still, MAS 5.0 subtracts a “stray signal” estimate from the PM signal that is11121 Microarray Expression Profiling in Cardiovascular Diseasebased on the intensity of the MM signal. However, in cases where the MM signaloutweighs the PM signal, an adjusted value is used.
While these adjustments willeliminate negative values, they still rely on the MM signal as an indicator for background noise, and will subtract a significant portion of the PM signal that is derivedfrom specific target-hybridization. In our experience, using the Signal value fromMAS 5.0 as the raw value for downstream comparison analysis significantly changesthe numbers of genes that are differentially expressed, as compared to using the AvgDiff value from MAS 4.0. It remains to be seen which analysis software provides amore reliable assessment of gene expression. As a side note, both software applications cannot be run on the same workstation.
MAS also provides the option for comparison analysis, however, it only allows the user to compare two chips at a time andis therefore not useful as an analysis tool for multiple chip experiments.Data Quality. It is important to visually inspect the scanned image in order toevaluate the quality of your sample and of the hybridization, and to detect potential flaws related to chip manufacturing. Some examples of flawed arrays includeuneven or high background, small white speckles appearing throughout the array,dark circular areas where no hybridization has taken place, holes or cracks (uncommon), dark scratches running through large sets of probes, but may be assubtle as a gradation of intensity due to light leakage into the scanner.
Smallflaws can be manually masked, but for large flaws running of the labeled probeon a new chip is advisable.In addition to the visual inspection, spiked in controls are frequently used toevaluate the hybridization efficiency. For Affymetrix arrays, oligonucleotide B2 creates a bright border around the array that is used for automatic placement of thegrid. Also, each eukaryotic probe array contains probe sets for several prokaryoticgenes that can be labeled and serve as controls for labeling and hybridization efficiencies. The integrity of the RNA, e.g.
the efficiency of first strand synthesis(converting mRNA into cDNA), is evaluated by several internal controls, such asGAPDH and b-actin, which are represented by 3 probe pairs corresponding to the5' end, 3' end or the middle part of the transcript. The 5' value should be at leasthalf of the 3' value (or more). Other quality measures are RawQ values, which describe the degree of pixel-to-pixel variation among the probe cells used to calculatethe background. The level of noise is used as a criterion to determine the significance of differences between PM and MM probe cells. Finally, the scaling factor(SF) is used to normalize the signal across the entire chip to an arbitrary target intensity (see 2.2. Comparing expression data – Normalization). The SF for each given experiment should be within a 2–3 fold range.Potential Pitfalls. One potential problem for frequent Affymetrix users is the constant change of array types and the inability to compare data from old arrays tothe new ones.
As an example, the first human chip, HuGeneFL, contained probesets for 5,600 full-length genes. The next generation of chips covered ~12,000genes and 50,000 EST clusters that were spread over 5 chips, HG-U95 A-E.Although the majority of genes that were represented on HuGeneFL were pre-1.2 Computational Analysis of Microarray Dataserved, the probe sets for these genes were not identical due to the constant updating of public databases and improvements in probe selection. In many cases,genes were represented by completely new probe sets.
At this time, the actual oligonucleotide sequence was Affymetrix proprietary information, and so it happened that Affymetrix designed probe sets based on GenBank entries that wereentered in the incorrect sequence orientation. This led to 25–35% of the probesets on the mouse (MG-U74v.1) and human (HG-U95Av.1) arrays being defective.Replacement arrays had only been around for about one year when new humanHG-U133 arrays were released. This set of two arrays contains over 1,000,000unique oligonucleotide features, which represent greater than 33,000 of the bestcharacterized human genes. With the completion of the human genome sequence, Affymetrix now uses genomic sequence information to verify their sequence selection, orientation, and quality. Also, the actual oligonucleotide sequences are now accessible on their website (Tab.
1.1). Another potential problemin comparing array data that have been collected over a longer period of time areinstrument upgrades. Previous scanner versions used a photon multiplier tube(PMT) gain of 100%, which allowed for fluorescent intensities to become easilysaturated. New scanner settings use a PMT of 10%, which increases the range ofintensity values that are detected. This is problematic when your scanner settingshave been changed in the middle of an ongoing experiment. Values from arraysthat have been scanned at different settings can no longer be directly compared.1.2.2Comparing Expression DataNormalization.