Nash - Compact Numerical Methods for Computers (523163), страница 8
Текст из файла (страница 8)
The second derivative is therefore replaced by the second difference atpoint j(2.11)(yj+l – 2yj + yj-1)/h 2 .The differential equation (2.8) is therefore approximated by a set of linearequations of which the jth is(2.12)or(2.13)Because y0 = 0 and yn = 2, this set of simultaneous linear equations is of order(n - 1).
However, each row involves at most three of the values yj. Thus, if theorder of the set of equations is large, the matrix of coefficients is sparse.2.3. THE LINEAR LEAST-SQUARES PROBLEMAs described above, n linear equations give relationships which permit n parameters to be determined if the equations give rise to linearly independent coefficientvectors. If there are more than n conditions, say m, then all of them may notnecessarily be satisfied at once by any set of parameters x. By asking a somewhatdifferent question, however, it is possible to determine solutions x which in someway approximately satisfy the conditions.
That is, we wish to writeAx b(2.14)where the sense of the sign is yet to be defined.By defining the residual vectorr = b – Ax(2.15)we can express the lack of approximation for a given x by the norm of r|| r ||.This must fulfil the following conditions:for r(2.16)|| r || > 0(2.17)|| cr || = || c || · || r ||(2.18)0, and || 0 || = 0,22Compact numerical methods for computersfor an arbitrary complex number c, and(2.19)|| r + s || < || r || + || s ||where s is a vector of the same order as r (that is, m).Condition (2.19) is called the triangle inequality since the lengths of the sides ofa triangle satisfy this relationship. While there exist many norms, only a few are ofwidespread utility, and by and large in this work only the Euclidean norm|| r ||E = (r T r) ½(2.20)will be used. The superscript T denotes transposition, so the norm is a scalar.
Thesquare of the Euclidean norm of r(2.21)is appropriately called the sum of squares. The least-squares solution x of (2.14) isthat set of parameters which minimises this sum of squares. In cases whererank(A) < n this solution is not unique. However, further conditions may beimposed upon the solution to ensure uniqueness. For instance. it may be requiredthat in the non-unique case, x shall be that member of the set of vectors whichminimises rT r which has x T x a minimum also. In this case x is the uniqueminimum-length least-squares solution.If the minimisation of r T r with respect to x is attempted directly, then using(2.15) and elementary calculus givesAT Ax = A T b(2.22)as the set of conditions which x must satisfy. These are simply n simultaneouslinear equations in n unknowns x and are called the normal equations.
Solution ofthe least-squares problem via the normal equations is the most common methodby which such problems are solved. Unfortunately, there are several objections tosuch an approach if it is not carefully executed, since the special structure of ATAand the numerical instabilities which attend its formation are ignored at the perilof meaningless computed values for the parameters x.Firstly, any matrix B such thatx T Bx > 0for all x(2.23)0 is called positive definite. Ifx T Bx > 0(2.24)for all x, B is non-negative definite or positive semidefinite. The last two terms aresynonymous.
The matrix AT A gives the quadratic formQ = xTAT Ax(2.25)for any vector x of order n. By settingy = AxQ = yT y > 0(2.26)(2.27)Formal problems in linear algebra23Tso that A A is non-negative definite. In fact, if the columns of A are linearlyindependent, it is not possible for y to equal the order-m null vector 0, so that inthis case AT A is positive definite. This is also called the full-rank case.Secondly, many computer programs for solving the linear least-squares problemignore the existence of special algorithms for the solution of linear equationshaving a symmetric, positive definite coefficient matrix.
Above it has already beenestablished that AT A is positive definite and symmetry is proved trivially. Thespecial algorithms have advantages in efficiency and reliability over the methodsfor the general linear-equation problem.Thirdly, in chapter 5 it will be shown that the formation of AT A can lead to lossof information. Techniques exist for the solution of the least-squares problemwithout recourse to the normal equations. When there is any question as to thetrue linear independence of the columns of A, these have the advantage that theypermit the minimum-length least-squares solution to be computed.It is worth noting that the linear-equation problem of equation (2.2) can besolved by treating it as a least-squares problem. Then for singular matrices Athere is still a least-squares solution x which, if the system of equations isconsistent, has a zero sum of squares r T r.
For small-computer users who do notregularly need solutions to linear equations or whose equations have coefficientmatrices which are near-singular (ill conditioned is another way to say this), it ismy opinion that a least-squares solution method which avoids the formation ofAT A is useful as a general approach to the problems in both equations (2.2) and(2.14).As for linear equations, linear least-squares problems are categorised bywhether or not they can be stored in the main memory of the computing device athand.
Once again, the traditional terms dense and sparse will be used, thoughsome problems having m large and n reasonably small will have very few zeroentries in the matrix A.Example 2.3. Least squaresIt is believed that in the United States there exists a linear relationship betweenfarm money income and the agricultural use of nitrogen, phosphate, potash andpetroleum.
A model is therefore formulated using, for simplicity, a linear form(money income) = x1 + x2 (nitrogen) + x3 (phosphate) + x4 (potash) + x5 (petroleum).(2.28)For this problem the data are supplied as index numbers (1940 = 100) to avoiddifficulties associated with the units in which the variables are measured. Bycollecting the values for the dependent variable (money income) as a vector b andthe values for the other variables as the columns of a matrix A including theconstant unity which multiplies x1, a problemAx b(2.14)is obtained.
The data and solutions for this problem are given as table 3.1 andexample 3.2.24Compact numerical methods for computersExample 2.4. Surveying-data fittingConsider the measurement of height differences by levelling (reading heights off avertical pole using a levelled telescope). This enables the difference between theheights of two points i and j to be measured asbij, = hi– hj + rij(2.29)where rij is the error made in taking the measurement. If m height differences aremeasured, the best set of heights h is obtained as the solution to the least-squaresproblemminimise rT r(2.30)wherer = b – Ahand each row of A has only two non-zero elements, 1 and -1, corresponding tothe indices of the two points involved in the height-difference measurement.
Sometimes the error is defined as the weighted residualrij = [bij – (hi – hj)]d ijwhere dij is the horizontal distance between the two points (that is, the measurement error increases with distance).A special feature of this particular problem is that the solution is onlydetermined to within a constant, h0, because no origin for the height scale hasbeen specified. In many instances, only relative heights are important, as in astudy of subsidence of land. Nevertheless, the matrix A is rank-deficient, so anymethod chosen to solve the problem as it has been presented should be capable offinding a least-squares solution despite the singularity (see example 19.2).2.4.
THE INVERSE AND GENERALISED INVERSE OF A MATRIXAn important concept is that of the inverse of a square matrix A. It is defined asthat square matrix, labelled A-1, such thatA-1A = AA-1 = 1 n(2.31)where 1n is the unit matrix of order n. The inverse exists only if A has full rank.Algorithms exist which compute the inverse of a matrix explicitly, but these are ofvalue only if the matrix inverse itself is useful.
These algorithms should not beused, for instance, to solve equation (2.2) by means of the formal expressionx = A- 1 b(2.32)since this is inefficient. Furthermore, the inverse of a matrix A can be computedby setting the right-hand side b in equation (2.2) to the n successive columns ofthe unit matrix 1 n. Nonetheless, for positive definite symmetric matrices, thismonograph presents a very compact algorithm for the inverse in §8.2.When A is rectangular, the concept of an inverse must be generalised.
Corresponding to (2.32) consider solving equation (2.14) by means of a matrix A+, yet tobe defined, such that(2.33)x = A+ b.Formal problems in linear algebra25In other words, we haveA + A = 1n .(2.34)When A has only k linearly independent columns, it will be satisfactory ifA+ A =(2.35)but in this case x is not defined uniquely since it can contain arbitrary componentsfrom the orthogonal complement of the space spanned by the columns of A. Thatis, we have(2.36)x = A+ b + (1 n – A + A) gwhere g is any vector of order n.The normal equations (2.22) must still be satisfied. Thus in the full-rank case, itis straightforward to identifyA+ = (AT A) -lA T .(2.37)In the rank-deficient case, the normal equations (2.22) imply by substitution of(2.36) that(2.38)AT A x = AT AA+ b+(AT A – AT AA+ A) g= AT b .IfAT AA+ = AT(2.39)then equation (2.38) is obviously true.