Conte, de Boor - Elementary Numerical Analysis. An Algorithmic Approach (523140), страница 43
Текст из файла (страница 43)
. . , xn+1]/g[x0, . . . , xn+1]|, with g(x) any function for which g(xi) = (-1) i, all i,provided x0 < x1 < · · · < xn+1. Then adapt Algorithm 2.3 to carry out the calculation ofg[ x0, . . . , xn+l ] simultaneously with that of f[x0, . . . ,xn+1 ] .6.2 DATA FITTINGWe have so far discussed the approximation of a function f(x) by means ofinterpolation at certain points.
Such a procedure presupposes that thevalues of f(x) at these points are known. Hence interpolation is of little use(if not outright dangerous) in the following common situation: The function f(x) describes the relationship between two physical quantities x andy = f(x), and, through measurement or other experiment, one has obtained numbers fn which merely approximate the value of f(x) at xn, that isf(xn ) = f n + εnn = 1,.
. .,N246APPROXIMATIONwhere the experimental errors εn are unknown. The problem of data fittingis to recover f(x) from the given (approximate) data fn, n = 1, . . . , N.Strictly speaking, one never knows that the numbers f n are in error.Rather, on the basis of other information about f(x) or even by merefeeling, one decides that f(x) is not as complicated or as quickly varying afunction as the numbers fn would seem to indicate, and therefore believesthat the numbers fn must be in error.Consider, for example, the data plotted in Fig.
6.5. Herexn = nn = 1 , . . . , 11If we have reason to believe that f(x) is a straight line, the given data aremost certainly in error. If we only know that f(x) is a convex function, westill can conclude that the data are erroneous. Even if we know nothingabout f(x), we might still be tempted to conclude from Fig. 6.5 that f(x) isa straight line, although we would now be on shaky ground. But whetheror not we know anything about f(x), we can conclude from the plotteddata that most of the information about f(x) contained in the data f n canbe adequately represented by a straight line.To summarize, data fitting is based on the belief that the given data fncontain a slowly varying component, the trend of, or the information about,f(x), and a comparatively fast varying component of comparatively smallamplitude, the error or noise in the data.
The task is to approximate or fitthe data by some function F*(x) in such a way that F*(x) contains orrepresents most (if not all) the information about f(x) contained in thedata and little (if any) of the error or noise.Figure 6.5 Least-squares straight-line approximation to certain data.6.2DATA FITTING247This is accomplished in practice by picking a functionF(x) = F(x; c1, . . . , ck )(6.20)which depends on certain parameters c1, .
. . , ck. Normally, one will try toselect a function F(x) which depends linearly on the parameters, so thatF(x) will have the form(6.21)where the {φ i } are an a priori selected set of functions and the {c i } areparameters which must be determined. The {φi } may, for example, be theset of monomials {x i - 1 } or the set of trigonometric functions {sin πix} .Normally, k is small compared with the number N of data points.
Thehope is that k is large enough so that the information about f(x) in thedata can be well represented by proper choice of the parametersc1, . . . ,ck, while at the same time k is too small to also allow forreproduction of the error or noise.Once practitioners of the art of data fitting have decided on the rightform (6.20) for the approximating function, they have to determine particular values c1*, . . . , ck* for the parameters ci to get a “good” approximationF*(x) = F(x; ci *, . .
. , ck*). The general idea is to choose { c i } so that thedeviationsdn, = fn - F(xn; cl, . . . , ck)n=1,...,Nare simultaneously made as small as possible (see Fig. 6.5 for suchdeviations in a typical example). In the terminology of Chap. 4, one tries tomake some norm of the N-vector d = [d1 d2 . . . dN] T as small aspossible; i.e., one attempts toMinimize ||d||as a function of c1, . . . , ck. Popular choices for the norm are(i) The 1-normif one wishes the average deviation to be as small as possible, or(ii) The cc-normif one wishes to make all deviations uniformly small.But, if one attacks these minimization problems in the spirit of Chap. 5or by some other means, one quickly discovers that they lead to a nonlinearsystem of equations for the determination of the minimum c1*, .
. . , ck* [see,e.g., the system (6.14) for the related problem of uniform approximation onan interval]. It is therefore customary to choose as the norm to be248APPROXIMATIONminimized the 2-normfor this leads to a linear system of equations for the determination of theminimum cj*’s. The resulting approximation F( x; cj*, . . . , ck* ) is thenknown as a least-squares approximation to the given data.We now derive the system of equations for the cj*‘s. Since the squareroot function is monotone, minimizing ||d||2 is the same task as minimizingFor c* = [c1, .
. . ck]T to be a minimum of the functionit is, of course, necessary that the gradient of E vanish at c*, i.e.,(see Sec. 5.1). Therefore, sincebecause of (6.21), c* must satisfy the so-called normal equations(6.22)The epithet “normal” is given to these equations since they specify thatthe error vector e = [e1 e2 . . . eN] T, with en = fn - F(xn; c*), all n,should be normal, or orthogonal, or perpendicular to each of the k vectors1Indeed, in terms of these N-vectors, (6.22) readsi = 1, . .
. ,kSince our general approximating function is of the form F(x) = c1 φ1(x) +this says that the error vector should (in thissense) be perpendicular to all possible approximating functions, i.e.,for all c1, . . . , ckThis identifies the vectoras the orthogonalprojection of the data vector f = [f1 f2 . . . fN] T onto the hyperplanespanned by the vectors φ1, φ2, . . . , φk .We rewrite the normal equations in the form(6.23)to make explicit the fact that they form a system of k linear equations in6.2DATA FITTING249the k unknowns c1*, c2* . .
. , ck*. As it turns out, this system always has atleast one solution [regardless of what the φi(x) are]; further, any solution of(6.23) minimizes E(c1, . . . , ck).To give an example, we now find the least-squares approximation tothe data plotted in Fig. 6.5 by a straight line.In this example, xn, = n, n = 1, . . . , 11, andF(x; cl, c2 ) = cl + c2xso that k = 2 and φ1(x) = 1, φ2 (x) = x.
The linear system (6.23) takes theform11c1* + 66c2* = 41.0466c1* + 506c2* = 328.05which, when solved by Gauss elimination, gives the unique solutionc1* = -0.7314 · · ·c2* = 0.7437 · · ·The resulting straight line is plotted also in Fig. 6.5.At this point, all would be well if it were not for the unhappy fact thatthe coefficient matrix of (6.23) is quite often ill-conditioned, enough so thatstraightforward application of the elimination algorithm 4.2 producesunreliable results. This is illustrated by the following simple example.Example 6.5 We are given approximate values fnf(xn) withand we have reason to believe that these data can be adequately represented by aparabola. Accordingly, we chooseφ 1(x) = 1φ 2(x) = xφ 3(x) = x2For this case, the coefficient matrix A of (6.23) isIt follows that8·104.
On the other hand, withwe getHence, from the inequality (4.38),we getTherefore the condition number of A is250APPROXIMATION5Actually, the condition number of A is much larger than 10 , as the following specificresults show. We pickand use exact data,f n = f(x n )n = 1, . . .,6Then, since f(x) is a polynomial of degree 2, F*(X) should be f(x) itself; therefore weshould getc1* = 10c 2 * = -2c3* = 0.1Using the elimination algorithm 4.2 to solve (6.23) for this case on the CDC 6500produces the resultc1* = 9.9999997437 · · ·c2* = -1.9999999511 · · ·c3* = 0.0999999976 · · ·so that 14-decimal-digit floating-point arithmetic for this 3 × 3 system gives only about8 correct decimal digits. If we round the (3,3) entry of A to 73,393.6 and repeat thecalculation, the computed answer turns out to be an astonishingc1* = 6.035· · ·c2* = -1.243· · ·c3* = 0.0639· · ·Similarly, if all calculations are carried out in seven-decimal-digit floating-pointarithmetic, the results arec1* = 8.492 · · ·c2* = -1.712 · · ·c3* = 0.0863 · · ·This example should make clear that it can be dangerous to rush intosolving the normal equations without some preliminary work.