Nash - Scientific Computing with PCs (523165), страница 54
Текст из файла (страница 54)
These give a quick visual view of the density of occurrence of observations;•Stem-and-leaf diagrams (Tukey, 1977) that serve simultaneously as a summary of the data (with orwithout loss of information depending on the manner of their development) and as a form ofhistogram.We note that such graphs will be important not only for measuring the variability exhibited within asingle program code but also for comparing variability between programs.A third class of graphs will show relative performance. We want to know which situations reflect winnersand which losers.
We want to make comparisons between different situations, problems, and programcodes.Finally, we want to detect both patterns and deviations from patterns, or outliers. Graphs are helpful forthis as the human eye is very good at seeing patterns and deviations from pattern, especially ifinformation is presented clearly. Unfortunately, humans are so good at this that they may see patternswhere none really exist. The use of plots to detect both patterns and outliers is an exploratory exercisewhere we are looking for suggestions for further investigation. The graphs do not "prove" anything, butthey may guide us toward such proof.In this case study, patterns help us to discover the overall performance of each program relative toproblem size and other factors.
We call these algorithm characteristics: behavior on small versus largeproblems or nearly linear versus very nonlinear problems. Outliers help us to detect special cases,program errors, or notable successes in solving problems.In considering how a display should be drawn we suggest:•The utility of graphs is enhanced if they contain as much information as possible. For example, agraph comparing the performance of several program codes simultaneously is likely to be more usefulthan several graphs for individual programs.•Good visual appearance enhances understanding of the information in the plot.•The presentation should be easily understood.
Obscure diagrams or jargon labelling confuses readers.19: GRAPHICAL TOOLS FOR DATA ANALYSIS163•Points on XY plots should be labelled by the "case" or problem or program. Without some care inchoice of labels, we cannot locate particular special cases or isolate problems.•Appropriate use of color or other graphical features helps the viewer. In this book we will not usecolor — color plates are expensive — but will try to illustrate different display features.•Graphical analysis tools are more likely to be used if we do not have to work too hard to producethem. Appropriate software should be acquired. If necessary, command scripts for this software tosimplify the creation of selected types of graphs should be prepared.19.4 Choices of Graphical ToolsWe present only a sample of the many choices of graphical display software here.
We use this sample toillustrate the range of features in such software. Desirable features of such software include:•Ease of data manipulation. We will commonly want to manipulate data to improve displays.Frequently data must be sorted to have graph points presented in the correct order. Labels may needto be added because the original identifiers are simply too cumbersome to display without clutteringthe graph. Transformations of data, e.g., taking logarithms, are often needed to render informationvisible.•Good automatic settings for drawing displays save the user effort.
Effective scaling of axes shouldensure that the graph points fill the display area yet the axis ticks are at reasonable numbers. Forexample, an X-axis from -2.345 to +123.3 with 4 ticks between is less easy to understand than a graphwith an X-axis from -25 to 125 with tick marks (possibly labelled) at 0, 25, 50, 75, and 100. Labels andtitles should be easily legible without forcing the graph itself to postage stamp size. Automaticselection of reasonable and clear graphing symbols, fonts, line types, colors or shadings saves mucheffort, though automatic selections that the user cannot change make for a worthless program.•Ease of annotation.
It should be easy to add extra notes or to identify special cases by adding text inappropriate places on a graph. Positioning — or moving — such notes should be easy and natural.•Identification of observations (cases). We may have a label for each case. It should be possible to usethis label as the plotting symbol if needed. Even better is a facility to use a pointing device (mouse,light pen, etc.) to move a cursor to a point and have it identified.
This lets us pinpoint situations thatare special or abnormal in some way.•Ease of file export. Once a graph is displayed, it should be easy to save the image so that it can beprinted or imported into other software for inclusion in documents, reports, presentations or displays.The purpose of graphical displays is the communication of ideas, first to ourselves, then to others, sodata transfer capabilities are extremely important.•Generally, the ease of use of a graphics package encourages the use of graphical data analysis.We now present some examples of graphical software.
We limit our discussion here to packages on MSDOS machines. In our opinion Macintosh software has had better graphical tools, in part because of themore consistent screen control design of the Macintosh.LOTUS 1-2-3: This well-known, easy to use, spreadsheet package allows users to plot up to six data seriesat once, but lacks tools to display variability (e.g., boxplots). To get good quality displays andprintouts it has been necessary to use add-ins such as ALWAYS.QUATTRO: This spreadsheet package has display-quality graphics; graph manipulation and annotationtools are built-in. Since Quattro can be used in "Lotus" mode, 1-2-3 users avoid learning newcommands.
Of interest to us in preparing this book has been the availability of PostScript output,though we can also import graphs into WordPerfect 5.1 in the Lotus PIC format. Spreadsheet packageshave very convenient features for data entry, edit, transformation, sorting and manipulation. We haveused a spreadsheet as the primary data entry mechanism for this study.164Copyright © 1984, 1994 J C & M M NashNash Information Services Inc., 1975 Bel Air Drive, Ottawa, ON K2C 0X1 CanadaSCIENTIFIC COMPUTING WITH PCsCopy for:Dr. Dobb’s JournalStata: Though designed as a general statistics package, Stata has excellent two-dimensional graphicaltools based on an easy-to-learn command language.
It has good facilities for exporting graphs to othersoftware and facilities. (We generally use the PIC file options.)SYSTAT: This is a full-featured statistical package. Graphics include three-dimensional point cloudspinning. This feature essentially takes a graph such as that in Figure 19.7.1 below (minus grid lines)and allows the user to rotate the plot about different axes in order to try to discover patterns.
SYSTATprograms are very large; we have experienced some "insufficient memory" problems depending onPC configuration and operating system.EXECUSTAT: The Student Edition of this commercial statistical package has interesting three-dimensional(3D) graphics both in the form of Figure 19.7.1 and point cloud spinning. It includes interactive dataanalysis tools such as point brushing: points are colored differently if a fourth variable is greater thanor less than a user-set control value.
This control value may be changed by mouse or arrowcommands, with the graph points dynamically changing color ("brushed") accordingly. Unfortunatelyimport and export may be awkward. Section 19.7 tells how we managed to export displays for usein this book.MATLAB: (Student Edition) allows for easy manipulation of matrix data. Graphical tools for displayingvariability are missing, except for histograms. 3D plots are possible. The Student Edition has limitedcapabilities for export of graphical data, for example to documents such as this book.
However, theregular versions offer full capability in export.For those who must include graphic capability in a user-written program, there are libraries of graphicalfunctions or sub-programs to ease the task (Nash J C, 1994). Three of these are GraphiC, VG andGraphPak Professional. They are designed for programmers who write in C, FORTRAN and BASIC,respectively. None appears to provide graphical displays for variability measures (not even histograms).GraphiC and GraphPak Professional provide some facilities for three-dimensional graphics. VG allowsmultiple displays on the same "screen" or page.