Heath - Scientific Computing (523150), страница 26
Текст из файла (страница 26)
3.1. The least squaressolution minimizes the sum of squares of vertical distances between the data points and thecurve over all possible quadratic polynomials.y•2.................................................................................................................................................................................................................................................................................................•1•−1••01tFigure 3.1: Least squares fit of a quadratic polynomial to the given data.3.3Normal Equations MethodThe classical method for solving least squares problems, due to Gauss, can be derived in avariety of ways.
We first show how it can be derived using calculus. In matrix notation, the86CHAPTER 3. LINEAR LEAST SQUARESleast squares criterion for data fitting can be expressed as minimizing the squared Euclideannormkrk22 = r T rof the residual vectorr = b − Ax.To minimizekrk22 = r T r = (b − Ax)T (b − Ax) = bT b − 2xT AT b + xT AT Ax,we take the derivative with respect to x and set it to zero:2AT Ax − 2AT b = o,which reduces to an n × n square linear systemAT Ax = AT b,commonly known as the system of normal equations. The name comes from the fact that the(i, j) entry of the matrix AT A is the inner product of the ith and jth columns of A; for thisreason AT A is also sometimes called the cross-product matrix of A.
Provided rank(A) = n(i.e., the columns of A are linearly independent), the matrix AT A is nonsingular, so thatthe system of normal equations has a unique solution, which is also the unique solution tothe original least squares problem.3.3.1OrthogonalityA more geometric derivation of the normal equations is based on the concept of orthogonality. Two vectors y and z are said to be orthogonal to each other, which is a synonymfor perpendicular or normal, if their inner product is zero, y T z = o.Since the matrix A has n columns, the space spanned by the columns of A (i.e., theset of all vectors of the form Ax), known as the column space or range space of A, is ofdimension n at most.
In the usual case for least squares, m > n, this fact implies that them-vector b generally does not lie in the column space of A, and hence there is no exactsolution to the equation Ax = b. Rather than an exact solution, however, in least squaresproblems we seek the vector in the column space of A that is closest to b (in the Euclideannorm), which is given by the orthogonal projection of b onto the column space of A. Forthis vector, the residual r = b − Ax is orthogonal to the column space of A. Thus, we haveo = AT r = AT (b − Ax),orAT Ax = AT b,which is again the system of normal equations. The geometric relationships we have justdescribed are shown in Fig.
3.2. This interpretation also suggests when the least squaressolution will be unique, for the orthogonal projection of b onto the column space of A willhave a unique representation of the form Ax if and only if the columns of A are linearlyindependent.3.3. NORMAL EQUATIONS METHOD87...................... ...... ...................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................br = b − AxAxFigure 3.2: Geometric depiction of a linear least squares problem.3.3.2Normal Equations MethodIf A has full column rank, then the matrix AT A is nonsingular.
Therefore, the n×n systemof normal equationsAT Ax = AT bcan be used to obtain the solution x to the linear least squares problem Ax ≈ b. Infact, in this case AT A is symmetric and positive definite, so we can compute its Choleskyfactorization,AT A = LLT ,where L is lower triangular. The solution x to the least squares problem can then becomputed by solving the triangular systems Ly = AT b and LT x = y.The normal equations method is an example of the general strategy noted earlier, wherea difficult problem is converted to successively easier ones having the same solution. In thiscase, the sequence of problem transformations isRectangular−→square−→triangular.Unfortunately, this method also illustrates another important fact, namely, that a problemtransformation that is legitimate theoretically is not always advisable numerically, as wewill see shortly.Example 3.3 Normal Equations Method.
We illustrate the normal equations methodby using it to solve the quadratic polynomial data-fitting problem given in Example 3.2: 1 −1.0 1.011111 1 −0.5 0.25 5.0 0.0 2.5,AT A = −1.0 −0.5 0.0 0.5 1.0 0.0 0.0 1 = 0.0 2.5 0.01.0 0.25 0.0 0.25 1.010.5 0.252.5 0.0 2.12511.0 1.0 1.011111 0.5 4.0 .AT b = −1.0 −0.5 0.0 0.51.0 0.0 = 1.01.0 0.25 0.0 0.25 1.00.53.252.088CHAPTER 3. LINEAR LEAST SQUARESWe previously computed the Cholesky factorization of this symmetric positive definite matrix in Example 2.12: 5.0 0.0 2.52.236002.23601.118 0.0 2.5 0.0 = 01.5810 01.5810 = LLT .2.5 0.0 2.1251.11800.935000.935Solving the lower triangular system Ly = AT b by forward substitution, we obtain1.789y = 0.632 .1.336Finally, solving the upper triangular system LT x = y by back-substitution, we obtain0.086x = 0.400 .1.429In theory the system of normal equations gives the exact solution to a linear least squaresproblem, but in practice this system can provide disappointingly inaccurate results.
Someof the potential difficulties are these:1. Information can be lost in forming the normal equations matrix and right-hand-sidevector. For example, take1 1A = 0,0 √where is a positive number smaller than the square root of machine precision, mach ,in a given floating-point system. Then1 + 21TA A=,11 + 2so that in floating-point arithmetic1fl(A A) =1T1,1which is exactly singular.2. The sensitivity of the solution is worsened, in that the condition of the normal equationsmatrix is worse than that of the original matrix A.
Specifically, the condition numberof the matrix is squared:cond(AT A) = [cond(A)]2 .(We will see in Section 4.5.2 how to assign a condition number to a rectangular matrix.For now, think of it as a measure of the distance to the closest rank-deficient matrix.)These shortcomings do not make the normal equations method useless, but they are causefor concern and provide motivation for seeking more numerically robust methods for linearleast squares problems.3.4.
ORTHOGONALIZATION METHODS3.3.389Augmented System MethodThe augmented system method is a variant of the normal equations method that can be useful in some situations. Together, the definition of the residual vector r and the requirementthat the residual be orthogonal to the columns of A give the system of two equationsr + Ax = b,AT r = o,which can be written in matrix form as the (m + n) × (m + n) augmented system IArb=,TAOxowhose solution yields both the desired vector x and the residual vector r.At first glance, this method does not look promising: The augmented system is symmetric but not positive definite, it is larger than the original system, and it requires that westore two copies of A. Moreover, if we simply pivot along the diagonal (equivalent to blockelimination in the block 2 × 2 system), we reproduce the normal equations, whose potentialnumerical shortcomings we have already observed.
The one advantage we have gained isthat other pivoting strategies are now available, which can be beneficial for numerical orother reasons.The selection of pivots in computing a symmetric indefinite (see Section 2.5.2) or LUfactorization of the augmented system matrix will obviously depend on the relative magnitudes of the entries in the upper and lower block rows. Since the relative scales of r and xare arbitrary, we introduce a scaling parameter α for the residual, giving the new system αI Ar/αb=.AT OxoThe parameter α controls the relative weights of the entries in the two subsystems inchoosing pivots from either.
A reasonable rule of thumb is to takeα = max |aij |/1000,i,jbut some experimentation may be required to determine the best value.A straightforward implementation of this method can be prohibitive in cost [proportionalto (m + n)3 ], so the special structure of the augmented matrix must be carefully exploited.For example, the augmented system method is used effectively in MATLAB for large sparselinear least squares problems.3.4Orthogonalization MethodsOwing to the potential numerical difficulties with the normal equations system, we needan alternative method that does not require formation of the normal equations matrixand right-hand-side vector. Thus, we seek a more numerically robust transformation thatproduces a new problem whose solution is the same as that of the original least squaresproblem but is more easily computed. We will see that, as with square linear systems,triangular form is a suitable target in simplifying least squares problems.