c15-5 (779589)
Текст из файла
15.5 Nonlinear Models681Lawson, C.L., and Hanson, R. 1974, Solving Least Squares Problems (Englewood Cliffs, NJ:Prentice-Hall).Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, Computer Methods for MathematicalComputations (Englewood Cliffs, NJ: Prentice-Hall), Chapter 9.We now consider fitting when the model depends nonlinearly on the set of Munknown parameters ak , k = 1, 2, .
. ., M . We use the same approach as in previoussections, namely to define a χ2 merit function and determine best-fit parametersby its minimization. With nonlinear dependences, however, the minimization mustproceed iteratively. Given trial values for the parameters, we develop a procedurethat improves the trial solution. The procedure is then repeated until χ2 stops (oreffectively stops) decreasing.How is this problem different from the general nonlinear function minimizationproblem already dealt with in Chapter 10? Superficially, not at all: Sufficientlyclose to the minimum, we expect the χ2 function to be well approximated by aquadratic form, which we can write as1χ2 (a) ≈ γ − d · a + a · D · a2(15.5.1)where d is an M -vector and D is an M × M matrix. (Compare equation 10.6.1.)If the approximation is a good one, we know how to jump from the current trialparameters acur to the minimizing ones amin in a single leap, namelyamin = acur + D−1 · −∇χ2 (acur )(15.5.2)(Compare equation 10.7.4.)On the other hand, (15.5.1) might be a poor local approximation to the shapeof the function that we are trying to minimize at acur .
In that case, about all wecan do is take a step down the gradient, as in the steepest descent method (§10.6).In other words,anext = acur − constant × ∇χ2 (acur )(15.5.3)where the constant is small enough not to exhaust the downhill direction.To use (15.5.2) or (15.5.3), we must be able to compute the gradient of the χ2function at any set of parameters a.
To use (15.5.2) we also need the matrix D, whichis the second derivative matrix (Hessian matrix) of the χ2 merit function, at any a.Now, this is the crucial difference from Chapter 10: There, we had no way ofdirectly evaluating the Hessian matrix. We were given only the ability to evaluatethe function to be minimized and (in some cases) its gradient. Therefore, we hadto resort to iterative methods not just because our function was nonlinear, but alsoin order to build up information about the Hessian matrix.
Sections 10.7 and 10.6concerned themselves with two different techniques for building up this information.Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited.
To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).15.5 Nonlinear Models682Chapter 15.Modeling of DataHere, life is much simpler. We know exactly the form of χ2 , since it is basedon a model function that we ourselves have specified. Therefore the Hessian matrixis known to us. Thus we are free to use (15.5.2) whenever we care to do so. Theonly reason to use (15.5.3) will be failure of (15.5.2) to improve the fit, signalingfailure of (15.5.1) as a good local approximation.The model to be fitted isy = y(x; a)(15.5.4)and the χ2 merit function isχ2 (a) =2N Xyi − y(xi ; a)σii=1(15.5.5)The gradient of χ2 with respect to the parameters a, which will be zero at the χ2minimum, has componentsNX[yi − y(xi ; a)] ∂y(xi ; a)∂χ2= −2∂akσi2∂aki=1k = 1, 2, .
. . , M(15.5.6)Taking an additional partial derivative givesNX∂ 2 y(xi ; a)∂ 2 χ21 ∂y(xi ; a) ∂y(xi ; a)(15.5.7)=2− [yi − y(xi ; a)]∂ak ∂alσi2∂ak∂al∂al ∂aki=1It is conventional to remove the factors of 2 by definingβk ≡ −1 ∂χ22 ∂akαkl ≡1 ∂ 2 χ22 ∂ak ∂al(15.5.8)making [α] = 12 D in equation (15.5.2), in terms of which that equation can berewritten as the set of linear equationsMXαkl δal = βk(15.5.9)l=1This set is solved for the increments δal that, added to the current approximation,give the next approximation. In the context of least-squares, the matrix [α], equal toone-half times the Hessian matrix, is usually called the curvature matrix.Equation (15.5.3), the steepest descent formula, translates toδal = constant × βl(15.5.10)Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).Calculation of the Gradient and Hessian15.5 Nonlinear Models683αklNX1 ∂y(xi ; a) ∂y(xi ; a)=σi2∂ak∂al(15.5.11)i=1This expression more closely resembles its linear cousin (15.4.8). You shouldunderstand that minor (or even major) fiddling with [α] has no effect at all onwhat final set of parameters a is reached, but affects only the iterative route that istaken in getting there.
The condition at the χ2 minimum, that βk = 0 for all k,is independent of how [α] is defined.Levenberg-Marquardt MethodMarquardt [1] has put forth an elegant method, related to an earlier suggestionof Levenberg, for varying smoothly between the extremes of the inverse-Hessianmethod (15.5.9) and the steepest descent method (15.5.10). The latter method isused far from the minimum, switching continuously to the former as the minimumis approached. This Levenberg-Marquardt method (also called Marquardt method)works very well in practice and has become the standard of nonlinear least-squaresroutines.The method is based on two elementary, but important, insights. Consider the“constant” in equation (15.5.10).
What should it be, even in order of magnitude?What sets its scale? There is no information about the answer in the gradient. Thattells only the slope, not how far that slope extends. Marquardt’s first insight is thatthe components of the Hessian matrix, even if they are not usable in any precisefashion, give some information about the order-of-magnitude scale of the problem.The quantity χ2 is nondimensional, i.e., is a pure number; this is evident fromits definition (15.5.5). On the other hand, βk has the dimensions of 1/ak , whichmay well be dimensional, i.e., have units like cm−1 , or kilowatt-hours, or whatever.(In fact, each component of βk can have different dimensions!) The constant ofproportionality between βk and δak must therefore have the dimensions of a2k . ScanSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).Note that the components αkl of the Hessian matrix (15.5.7) depend both on thefirst derivatives and on the second derivatives of the basis functions with respect totheir parameters.
Some treatments proceed to ignore the second derivative withoutcomment. We will ignore it also, but only after a few comments.Second derivatives occur because the gradient (15.5.6) already has a dependenceon ∂y/∂ak , so the next derivative simply must contain terms involving ∂ 2 y/∂al ∂ak .The second derivative term can be dismissed when it is zero (as in the linear caseof equation 15.4.8), or small enough to be negligible when compared to the terminvolving the first derivative.
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.















