Heath - Scientific Computing (523150), страница 57
Текст из файла (страница 57)
In addition to the solution x, the output typically includes a status flag indicatingany warnings or errors. A preliminary plot of the functions involved can help greatly indetermining a suitable starting guess.Table 6.2 is a list of some of the software available for solving nonlinear least squaresproblems, linear programming problems, and general nonlinear constrained optimizationproblems.
Good software is also available from a number of sources for solving many othertypes of optimization problems, including quadratic programming, linear or simple bounds208CHAPTER 6. OPTIMIZATIONconstraints, network flow problems, etc. There is an optimization toolbox for MATLAB inwhich some of the software listed in the tables can be found, along with numerous additionalroutines for various other optimization problems.
For the nonlinear analogue of total leastsquares, called orthogonal distance regression, odrpack(#676) is available from TOMS. Acomprehensive survey of optimization software can be found in [184].Table 6.2: Software for nonlinear least squares and constrained optimizationNonlinearLinearNonlinearSourceleast squaresprogramming programmingHSLns13/va07/vb01/vb03 la01vf01/vf04/vf13IMSLunlsfdlprsnconf/ncongMATLABleastsqlpconstrMINPACK lmdif1NAGe04fdfe04mbfe04vdfnetlibvarpro/dqedNRmrqminsimplxNUMALgssnewton/marquardtPORTn2f/n2g/nsf/nsgSLATECsnls1splpSOLminosnpsolTOMSnl2sol(#573)6.7Historical Notes and Further ReadingAs with nonlinear equations in one dimension, the one-dimensional optimization methods based on Newton’s method or interpolation are classical. A theory of optimal onedimensional search methods using only function value comparisons was initiated in the1950s by Kiefer, who showed that Fibonacci search, in which successive evaluation pointsare determined by ratios of Fibonacci numbers, is optimal in the sense that it producesthe minimum interval of uncertainty for a given number of function evaluations.
Whatwe usually want, however, is to fix the error tolerance rather than the number of functionevaluations, so golden section search, which can be viewed as a limiting case of Fibonaccisearch, turned out to be more practical. See [272] for a detailed discussion of these methods.As with nonlinear equations, hybrid safeguarded methods for one-dimensional optimizationwere popularized by Brent [23].For multidimensional optimization, most of the basic direct search methods were proposed in the 1960s. The method of Nelder and Mead is based on an earlier method ofSpendley, Hext, and Himsworth.
Another popular direct search method is that of Hookeand Jeeves. For a survey of these methods, see [252].Steepest descent and Newton’s method for multidimensional optimization were analyzedas practical algorithms by Cauchy. Secant updating methods were originated by Davidon(who used the term variable metric method ) in 1959. In 1963, Fletcher and Powell publishedan improved implementation, which came to be known as the DFP method. Continuing thisREVIEW QUESTIONS209trend of initialisms, the BFGS method was developed independently by Broyden, Fletcher,Goldfarb, and Shanno in 1970. Many other secant updates have been proposed, but thesetwo have been the most successful, with BFGS having a slight edge.
The conjugate gradientmethod was originally developed by Hestenes and Stiefel in the early 1950s to solve symmetric linear systems by minimizing a quadratic function. It was later adapted to minimizegeneral nonlinear functions by Fletcher and Reeves in 1964.The Levenberg-Marquardt method for nonlinear least squares was originally developedby Levenberg in 1944 and improved by Marquardt in 1963. A definitive modern implementation of this method, due to Moré [181], can be found in MINPACK [182].The simplex method for linear programming, which is still the workhorse for such problems, was originated by Dantzig in the late 1940s. The first polynomial-time algorithm forlinear programming, the ellipsoid algorithm published by Khachiyan in 1979, was based onearlier work in the 1970s by Shor and by Judin and Nemirovskii (Khachiyan’s main contribution was to show that the algorithm indeed has polynomial complexity).
A much morepractical polynomial-time algorithm is the interior point method of Karmarkar, published in1984, which is related to earlier barrier methods popularized by Fiacco and McCormick [78].Good general references on optimization, with an emphasis on numerical algorithms,are [40, 80, 95, 167, 189].
Algorithms for unconstrained optimization are covered in [57] andthe more recent surveys [98, 192]. The theory and convergence analysis of Newton’s methodand quasi-Newton methods are summarized in [183] and [56], respectively. For a detaileddiscussion of nonlinear least squares, see [14].
The classic account of the simplex methodfor linear programming is [48]. More recent treatments of the simplex method can be foundin [96, 167, 189]. For an overview of linear programming that includes polynomial-timealgorithms, see [99]. For a review of interior point methods in constrained optimization,see [278].Review Questions6.1 True or false: Points that minimize anonlinear function are inherently less accurately determined than points for which a nonlinear function has a zero value.6.2 True or false: If a function is unimodalon a closed interval, then it has exactly oneminimum on the interval.6.3 True or false: In minimizing a unimodalfunction of one variable by golden sectionsearch, the point discarded at each iterationis always the point having the largest functionvalue.6.4 True or false: For minimizing a realvalued function of several variables, the steepest descent method is usually more rapidlyconvergent than Newton’s method.6.5 True or false: The solution to a linearprogramming problem must occur at one ofthe vertices of the feasible region.6.6 True or false: The approximate solutionproduced at each step of the simplex methodfor linear programming is a feasible point.6.7 Suppose that the real-valued functionf is unimodal on the interval [a, b].
Let x1and x2 be two points in the interval, witha < x1 < x2 < b. If f (x1 ) = 1.232 andf (x2 ) = 3.576, then which of the followingstatements is valid?1. The minimum of f must lie in the subinterval [x1 , b].2. The minimum of f must lie in the subinterval [a, x2 ].3. You can’t tell which of these two subintervals the minimum must lie in withoutknowing the values of f (a) and f (b).2106.8 (a) In minimizing a unimodal functionof one variable on the interval [0, 1] by goldensection search, at what two points in the interval is the function initially evaluated?(b) Why are those particular points chosen?6.9 If the real-valued function f is monotonic on the interval [a, b], will golden sectionsearch to find a minimum of f still converge?If not, why, and if so, to what point?6.10 Suppose that the real-valued function fis unimodal on the interval [a, b], and x1 andx2 are points in the interval such that x1 < x2and f (x1 ) < f (x2 ).(a) What is the shortest interval in which youknow that the minimum of f must lie?(b) How would your answer change if we happened to have f (x1 ) = f (x2 )?6.11 List one advantage and one disadvantageof golden section search compared with successive parabolic interpolation for minimizing afunction of one variable.6.12 (a) Why is linear interpolation of a function f : R → R not useful for finding a minimum of f ?(b) In using quadratic interpolation for onedimensional problems, why would one use inverse quadratic interpolation for finding a zerobut regular quadratic interpolation for findinga minimum?6.13 For minimizing a function f : R → R,successive parabolic interpolation and Newton’s method both fit a quadratic polynomialto the function f and then take its minimumas the next approximate solution.(a) How do these two methods differ in choosing the quadratic polynomials they use?(b) What difference does this make in their respective convergence rates?6.14 Explain why Newton’s method minimizes a quadratic function in one iteration butdoes not solve a quadratic equation in one iteration.6.15 Suppose you want to minimize a function of one variable, f : R → R.
For each convergence rate given, name a method that normally has that convergence rate for this problem:CHAPTER 6. OPTIMIZATION(a) Linear but not superlinear(b) Superlinear but not quadratic(c) Quadratic6.16 Suppose you want to minimize a function of several variables, f : Rn → R. For eachconvergence rate given, name a method thatnormally has that convergence rate for thisproblem:(a) Linear but not superlinear(b) Superlinear but not quadratic(c) Quadratic6.17 Which of the following iterative methods have a superlinear convergence rate undernormal circumstances?(a) Successive parabolic interpolation for minimizing a function(b) Golden section search for minimizing afunction(c) Interval bisection for finding a zero of afunction(d ) Secant updating methods for minimizing afunction of n variables(e) Steepest descent method for minimizing afunction of n variables6.18 (a) For minimizing a real-valued function f of n variables, what is the initial searchdirection in the conjugate gradient method?(b) Under what condition will the BFGSmethod for minimization use this same initialsearch direction?6.19 For minimizing a quadratic function ofn variables, what is the maximum number ofiterations required to converge to the exact solution (assuming exact arithmetic) from an arbitrary starting point for each of the followingalgorithms?(a) Conjugate gradient method(b) Newton’s method(c) BFGS secant updating method with exactline searchREVIEW QUESTIONS6.20 (a) What is meant by a critical point (orstationary point) of a smooth nonlinear function f : Rn → R?(b) Is a critical point always a minimum ormaximum of the function?(c) How can you test a given critical point todetermine which type it is?6.21 Let f : R2 → R be a real-valued functionof two variables.
What is the geometrical interpretation of the vector∂f (x)/∂x1∇f (x) =?∂f (x)/∂x2Specifically, explain the meaning of the direction and magnitude of ∇f (x).211(c) Gauss-Newton method for solving a nonlinear least squares problem6.26 Let f : Rn → Rn be a nonlinear function.Since kf (x)k = 0 if and only if f (x) = o, doesthis relation mean that searching for a minimum of kf (x)k is equivalent to solving thenonlinear system f (x) = o? Why?6.27 (a) Why is a line search parameter always used in the steepest descent method forminimizing a general function of several variables?(b) Why might one use a line search parameterin Newton’s method for minimizing a functionof several variables?6.22 (a) If f : Rn → R, what do we call theJacobian matrix of the gradient ∇f (x)?(b) What special property does this matrixhave, assuming f is twice continuously differentiable?(c) What additional special property does thismatrix have near a local minimum of f ?6.28 What is a good way to test a symmetric matrix to determine whether it is positivedefinite?6.23 The steepest descent method for minimizing a function of several variables is usually slow but reliable.