Nash - Scientific Computing with PCs (523165), страница 51
Текст из файла (страница 51)
We will call this vector a, andthe rule for accessing elements in A and L is that we read our matrix row-wise, that is,a(i*(i-1)/2 + j) = Aij(18.2.6)The indices for1247a() are thus in the following positions in A or L:3586910and so on.Figures 18.2.1, 18.2.2 and 18.2.3 show the FORTRAN code for three different implementations of theCholesky algorithm.
BASIC, PASCAL and C versions of these three codes were prepared by directtranslation of programming language structures. Later in the chapter in Table 18.5.2 we summarizeexecution times, while Table 18.5.1 presents the relative execution rates of the Price and Healy variantsto the Nash codes, and various measures of size and performance are presented in graphs.Figure 18.2.1Nash J C (1990d) variant of theCholesky decomposition(FORTRAN)SUBROUTINE NCHOL(A,N2,N,IFL)C NASH51011192021LOGICAL IFLDIMENSION A(N2)IFL=.FALSE.DO 20 J=1,NL=J*(J+1)/2IF(J.EQ.1) GO TO 11J1=J-1DO 10 I=J,NL2=I*(I-1)/2+JS=A(L2)DO 5 K=1,J1L2K=L2-KLK=L-KS=S-A(L2K)*A(LK)CONTINUEA(L2)=SCONTINUEIF(A(L).LE.0.0) GO TO 21S=SQRT(A(L))DO 19 I=J,NL2=I*(I-1)/2+JA(L2)=A(L2)/SCONTINUECONTINUERETURNIFL=.TRUE.RETURNENDFigure 18.2.3Figure 18.2.2Price implementation of theCholesky decompositionSUBROUTINE PCHOL(A,N2,N,IFL)C PRICELOGICAL IFLDIMENSION A(N2)IFL=.FALSE.J2=0DO 20 J1=1,NI2=J2DO 10 I1=J1,NL=I2+J1S=A(L)J=14IF(J.EQ.J1) GO TO 5K=I2+JM=J2+JS=S-A(K)*A(M)J=J+1GOTO 45IF(I1.EQ.J1) GO TO 6A(L)=S/TGO TO 76IF(S.LE.0.0) GO TO 21T=SQRT(S)A(L)=T7I2=I2+I110CONTINUEJ2=J2+J120CONTINUERETURN21IFL=.TRUE.RETURNENDListing of the Healy (1968) variant of the Cholesky decompositionSUBROUTINE HCHOL(A,N2,N,IFL)C HEALYLOGICAL IFLDIMENSION A(N2)IFL=.FALSE.J=1K=0DO 10 ICOL=1,NL=0DO 11 IROW=1,ICOLK=K+1W=A(K)M=JDO 12 I=1,IROWL=L+1121311141021IF(I.EQ.IROW) GO TO 13W=W-A(L)*A(M)M=M+1CONTINUEIF(IROW.EQ.ICOL) GO TO 14IF(A(L).LE.0.0) GO TO 21A(K)=W/A(L)CONTINUEIF(W.LE.0.0) GO TO 21A(K)=SQRT(W)J=J+ICOLCONTINUERETURNIFL=.TRUE.RETURNEND152Copyright © 1984, 1994 J C & M M NashNash Information Services Inc., 1975 Bel Air Drive, Ottawa, ON K2C 0X1 CanadaSCIENTIFIC COMPUTING WITH PCsCopy for:Dr.
Dobb’s JournalThe method and algorithm are essentially the same for all three codes. The storage mechanism is arow-wise single vector holding a triangular array. The same equations are used to calculate the elementsof L but Nash and Price compute L row by row, while Healy does it column-wise.The differences between the codes are in their looping controls and in how they get the storage vectorindex for a particular element of the matrix.
The Nash code (Nash J C, 1990d, Algorithm 7) uses theformula(18.2.7)k = i (i-1) / 2 + jto give the storage vector index k corresponding to the i,j element of the matrix A, as we have shownabove. The Healy (1968) code and the Price code (personal communication from Kevin Price, a colleagueand friend) attempt to avoid the arithmetic involved in Equation 18.2.7 by careful organization of theprogram.
The devices used are discussed partially in Quittner (1977), pages 324.327. In the late 1960s andearly 1970s it was common to try to save execution (CPU) time in this way. For the IBM (mainframe)FORTRAN G1 compiler and to some extent for various other computing environments, it is clear suchefforts may be useful.Unfortunately, many PCs and language processors do not behave as anticipated by the Price and Healycodes. The straightforward approach used in the Nash variant then has the advantage of being moredirectly comprehensible to readers and programmers. Sometimes it turns out to be more efficient,sometimes remarkably so, while in other instances the differences in efficiency may not warrant theattention to detail needed to avoid the index calculation. Our advice, which will be repeated, is that iftiming is critical, then measurements must be made for the particular situation in which the computationsare important.Over the years, we have attempted to compare these programs and their variants in other programminglanguages on a diverse computers.
Table 18.2.1 lists some we have looked at.18.3 Measuring the DifferencesElapsed time for either execution or compilation/linking is the main measure we will be comparing here,with program size a second quantity of interest. Measurement of elapsed time may be non-trivial forseveral reasons discussed in Section 8.6. Some relevant points follow.•We want to measure only times for the program code of interest. Other processes active in our PCmay interfere, for example, background tasks, disk activity, or time-shared programs. In theseCholesky timings, all PCs had a screen-blanker program to blank the screen after three minutes of nokeyboard activity. We sometimes forgot to disable the fax-modem program watching for incomingfacsimile messages in one PC (this is noted where it happened).•Some compilers/interpreters do not allow direct measurement of elapsed time within a program, forexample Microsoft Fortran version 3.2 and Turbo Pascal version 3.01a on MS-DOS computers.
UsingTIMER (Figure 8.6.1), we timed separate programs for each variant of the Cholesky algorithm and foreach matrix order 10, 20, 30, 40 and 50. We also timed the codes without the decomposition tomeasure time for program loading, building of the test matrix and reporting of results.•To minimize human effort, control of timing is carried out using batch command files in the MS-DOSenvironments. As there may be delays reading and executing such files, we store all the codes to betested and also the batch files on RAM-disk to minimize input-output times that might createuncertainty in our timings.•Some timings, performed a long time ago on obsolete machines, are included to illustrate that theissues presented have been a continuing concern in matters of program and algorithm efficiency.•Near the end of the timing study, we changed versions of the MS-DOS operating system from 3.3 to18: EFFICIENCY STUDY — CHOLESKY DECOMPOSITIONS1535.0 to accommodate the installation of a local area network.
There may be subtle effects on the timingsdue to this change. The network was disabled for timings.•Occasionally there are results that seem to make no sense at all. For example, the 1980 timing fordecomposing an order 35 matrix on the Terak machine was 11.11 seconds, while order 30 took 11.88seconds. We did not see anything this bizarre in the recent timings.•The discipline required to record all details for each timing exercise is such that we had to repeatsome timings.For our purposes, we have chosen to find the Cholesky decomposition of the Moler matrix (Nash J C,1990d, p.
253). This matrix has the elementsAij = iAij = min(i,j) - 2(18.3.1)Table 18.2.1for i = jotherwiseComputers and programming environments used in Cholesky algorithm timingcomparisons.I) timings pre-1984machineprogramming environment and optionsControl Data Cyber 74IBM System 370/168FTN compiler, opt=0,1,2Fortran HX, OPT=2, single precisionFortran G1, single precisionFORFORTGOUCSD PASCALExtended BASIC14 digit CBASIC14 digit North Star Basic8 digit BASIC8 digit FPBASIC, proprietary co-processor14 digit FPBASIC, proprietary co-processorSperry Univac 1108Hewlett-Packard 3000Terak LSI/11Data General EclipseImsai Z80(courtesy K. Stewart)North Star HorizonII) 1992 testsMacintosh SETrue BASICFor the MS-DOS PCs listed, most computing environments were employed across the systems except wherenoted. As there are many options, not all will be listed here.machines:286 = 12 MHz Intel 286/8 MHz Intel 80287 (286)NBK = Eurocom Notebook (Intel 80386DX 33Mhz, ULSI 80387)V20 = NEC V20 with Intel 8087 (V20)386 = 33 MHz Intel 80386DX/80387 with 64K cache (386)programming environments:Microsoft Basic 6.0Borland Turbo BASIC 1.1True BASIC 3.04 (PC)Microsoft GW Basic 3.23interpreterMicrosoft Fortran 3.2Lahey Fortran F77L 3.01Microsoft Fortran 4.10Microsoft Fortran 5.10++**++Silicon Valley Systems 2.8.1b++Borland Turbo Pascal 3.01aBorland Turbo Pascal 5.0Borland Turbo Pascal 5.5Borland Turbo Pascal for Windows(1991)Microsoft Quick C 2.5Turbo C++ 2.0**Note: Turbo C++ and Borland C++ are different products.compiled by Gordon Sande154Copyright © 1984, 1994 J C & M M NashSCIENTIFIC COMPUTING WITH PCsNash Information Services Inc., 1975 Bel Air Drive, Ottawa, ON K2C 0X1 CanadaCopy for:Dr.
Dobb’s JournalThus the order 5 Moler matrix is1-1-1-1-1-12000-10311-10142-10125The Cholesky decomposition of this matrix is given by the matrix L having elements that follow theformulas(18.3.2)= 1Lij = 0= -1forforforj=ij>ij<iThus, for order 5, we have the triangular matrix1-1-1-1-11-1-1-11-1-11-1118.4 Program Sizes and Compiler OptionsNovice users of PCs are frequently surprised at the very large differences between the sizes of programsthat ostensibly perform the same tasks. Table 18.3.1 presents a summary of source and execution programsizes for the Cholesky decomposition timing codes in FORTRAN, BASIC, PASCAL and C.