Nash - Scientific Computing with PCs (523165), страница 36
Текст из файла (страница 36)
Let us suppose we havem observations, that is, we have asked m computer programmers how much they earn, and how manyyears of education and experience they have. The number of observations m may be 10 or 10000; this isimmaterial now, but it will become important later.We rename Earnings as variable y. The i’th observation of y will be yi. The variables Education andExperience can be renamed. However, instead of separate names, we will let Education be the first columnof a matrix X and Experience be its second column.
To complete the matrix, its third column of X will bea column of ones.(12.1.2)yi =Xij bj + eiwhere r=b(1), s=b(2) and t=b(3), and ei is the "error" in the i’th observation. The whole ensemble ofobservations has a matrix product form (using boldface for vectors)(12.1.3)y=Xb+eThe fitting of the model now has the obvious interpretation that the three coefficients bj, j=1, 2, 3 mustbe chosen to make the elements ei, i=1, 2,..., m small in some way.
The most common measure for thesize of the "errors" is the sum of squares104Copyright © 1984, 1994 J C & M M NashNash Information Services Inc., 1975 Bel Air Drive, Ottawa, ON K2C 0X1 Canada(12.1.4)S(b) = eT e =This norm for the vectormatrix transposition.SCIENTIFIC COMPUTING WITH PCsCopy for:Dr. Dobb’s Journalei2e produces a least squares estimation of the model. The superscript T denotesHaving decided that we have a linear least squares problem, our formulation could be consideredcomplete.
However, the problem may arrive in disguise, for example, as a function minimization.(12.1.5)MinimizeS(b) with respect to bOr we may be asked to solve the linear equations(12.1.6)XT X b = XT yfor b. These are the normal equations for solving the linear least squares problem. We can classify thetasks as either linear least squares, function minimization or linear equations. Clearly, the differentclassifications will direct us toward different methods for solution of the problem.12.2 Algorithm ChoiceIn our example, we already have formulations as linear equations, function minimization and linear leastsquares. Once our general approach is settled, we must look at the various features of our problem tochoose an appropriate algorithm — a recipe for calculating the desired solution.
Different algorithms mayhave advantages if we must solve many similar problems. To illustrate the choices, let us considerpossibilities for the least squares problem in some detail.First, traditional treatments form the normal equations (Nash J C, 1990d, p. 22). Letting(12.2.1)(12.2.2)A = XT Xv = XT ywe can rewrite the normal equations as(12.2.3)Ab=vMany algorithms exist for solving n equations in n unknowns. (In our case n=3). As n is small it mayeven be reasonable to use Cramer’s rule (See Dahlquist and Björck, 1974, p.
138) to find the solution. Itis more likely that a variation on a general technique such as Gaussian elimination (Nash J C, 1990d,Section 6.2) is appropriate. Since A is symmetric and non-negative definite (Nash J C, 1990d, Chapter 7),we could also use the Cholesky decomposition or a symmetric Gauss-Jordan reduction (Nash J C, 1990d,Chapter 8). We could even use general QR or singular value decomposition algorithms to solve the normalequations.Note that a formal solution of the normal equations is obtained by inverting the sum of squares andcross-products matrix A and multiplying the inverse times the vector v.
The inversion and matrixmultiplication introduce unnecessary work, but there are numerous Gauss-Jordan flavor programs ofmongrel parentage for performing this task. Unfortunately, stepwise regression is also very easilyprogrammed using this approach, but risks numerical instability unless care is taken.The least squares problem can also be solved by direct decomposition of the data matrix X.
In the QRdecomposition, we find the product of an orthogonal matrix Q and an upper triangular matrix R suchthat12: STEPS IN PROBLEM SOLVING(12.2.4)105X=QRwhere(12.2.5)Q T Q = 1nthe identity or order(12.2.6)n, andRij = 0ifi>jWe can show that the simple solution of the triangular equations(12.2.7)R b = QT yis a solution to the normal equations, since(12.2.8)XT X b = RT QT Q R b = RT QT y = XT yand we knowR is non-singular if it has no zero diagonal elements.There are several good methods for computing a QR decomposition.
The Gram-Schmidt orthogonalizationis one of the oldest. Rice (1966) showed that a modified form of this method has superior numericalproperties to the conventional algorithm. The Gram-Schmidt method builds the R matrix row by row, orcolumn by column, by orthogonalizing the columns of matrix X with respect to each other. By thinkinginstead of a process that transforms X to an upper triangular matrix via multiplication by elementaryorthogonal matrices, one arrives at the Givens and Householder decompositions (Nash J C, 1990d, Chapter4). Clearly, one has plenty of choice in carrying out the QR decomposition of X.The singular value decomposition (svd) offers yet another approach to solving the least squares problem.The svd decomposes the matrix X into a product of three matrices U, S, and V in the form:(12.2.9)X = U S VTIn the svd, the matrix(12.2.10)(12.2.11)The matrices(12.2.12)(12.2.13)whereS is diagonal and its elements are non-negative.
That is,Sij = 0Sii ≥ 0forfori≠ji=jU and V have the following orthogonality properties (U is m by n, V is n by n)U T U = 1nVT V = V VT = 1n1n is the identity matrix of order n.There are two main algorithm families for the svd for sequential computers. The first is due to Golub andReinsch (1971) with variants published by Businger and Golub (1969) and Tony Chan (1982) and isderived from the QR algorithm applied to matrix eigenvalue problems (Stoer and Bulirsch 1980, page 377).The other algorithm family for the svd is based on the Jacobi method for eigenvalue problems (Nash J C,1975; Nash J C and Lefkovitch, 1976; Nash J C, 1990d, Algorithms 1 & 4; Nash J C and Schlien, 1987; Luk,1980).
Providing certain simple steps are taken during the calculations, the Jacobi algorithms will orderthe singular values Sii so that(12.2.14)S11≥ S22≥ . . . ≥ Snn≥ 0Jacobi-type methods have surprisingly few steps, and with the property of ordering the singular values,they are good candidates for use in situations where code length must be short. It has also been of interestfor users of computers with unusual architectures (Luk 1980).With so much information about different algorithms, the user has a considerable analysis to perform to106Copyright © 1984, 1994 J C & M M NashSCIENTIFIC COMPUTING WITH PCsNash Information Services Inc., 1975 Bel Air Drive, Ottawa, ON K2C 0X1 CanadaCopy for:Dr. Dobb’s Journaldecide which algorithm to implement. The comparisons can be made under three main categories:features, size/complexity and efficiency.The features of an algorithm include:•The classes of problems solved in terms of type (e.g., linear equations, least squares) and size(m and n).•The "extras" offered by the algorithm.
For example, in linear least squares modelling we may wantstandard errors for the estimates of the model coefficients. These are easily calculated frominformation in the matrices that comprise the svd, which also leads to a natural method forperforming the principal components regression variant of linear least squares modelling (Nash J C,1979b).•The reliability and accuracy of the algorithm in terms of its ability to compute the correct solution.Certain least squares methods based on the normal equations and the Gauss-Jordan reduction willusually give accurate estimates for the coefficients bj , j = 1, 2,. . ., n (often more accurate than thesvd approach).
Unfortunately, one can devise cases where they generate very poor answers. Undersimilar circumstances the svd remains stable and provides good estimates of the coefficients, withdiagnostic information on their lack of precision.•The robustness of the algorithm (or program resulting from it) to exceptional cases, for example a leastsquares problem with only one observation or variable.Without presenting the subject of computational complexity, we can still discuss and compare algorithmsin terms of the number of calculations required (e.g., additions, multiplications), the working storagerequired in terms of variables and arrays, and the structure of the algorithms such as the number ofbranches and decision points and number of special cases to be handled.The last area of comparison concerns speed, effort and cost under the overall heading of efficiency.