Heath - Scientific Computing (523150), страница 51
Текст из файла (страница 51)
Oneof the simplest and cheapest is to try to compute its Cholesky factorization: the Choleskyalgorithm will succeed if and only if the matrix is positive definite (of course, this suggestionassumes that one has a Cholesky routine that fails gracefully when given a nonpositivedefinite matrix as input). Another good method is to compute the inertia of the matrix(see Section 4.3.10) using a symmetric factorization of the form LDLT , as in Section 2.5.2.A much more expensive approach is to compute the eigenvalues of the matrix and checkwhether they are all positive.1866.1.3CHAPTER 6. OPTIMIZATIONAccuracy of SolutionsConsider the Taylor series expansionf (x + h) = f (x) + f 0 (x)h +f 00 (x) 2h + O(h3 ),2where f : R → R.
If f (x∗ ) = 0 and f 0 (x∗ ) 6= 0, as is usually the case in solving a nonlinearequation, then the foregoing expansion indicates that for small values of h, f (x∗ + h) ≈ ch,where c = f 0 (x∗ ). This expression implies that small changes in x∗ cause proportionallysmall changes in f (x∗ ), and hence the solution can be computed about as accurately as thefunction values can be evaluated, which is often at the level of machine precision.In a minimization problem, however, we usually have f 0 (x∗ ) = 0 and f 00 (x∗ ) 6= 0, so thatfor small values of h, f (x∗ + h) ≈ f (x∗ ) + ch2 , where c = f 00 (x∗ )/2.
This means that a smallchange of order h in x∗ causes a change of order h2 in f (x∗ ), and hence one cannot expectthe accuracy of the solution to be less than the square root of the error in the functionvalues. Geometrically, a minimum is analogous to a multiple root of a nonlinear equation:in either case a horizontal tangent implies that the function is locally approximately parallelto the x axis, and hence the solution is relatively poorly conditioned.
Although simple zerosof a function can often be found to an accuracy of nearly full machine precision, minimizers√of a function can be found to an accuracy of only about half precision (i.e., mach ). Thisfact should be kept in mind when selecting an error tolerance for an optimization problem:an unrealistically tight tolerance may drive up the cost of computing a solution withoutproducing a concomitant gain in accuracy.6.2One-Dimensional OptimizationWe begin our study of methods for optimization with problems in one dimension. Theone-dimensional case is simpler than multidimensional optimization yet illustrates many ofthe ideas and issues that arise in higher dimensions.First, we need a way of bracketing a minimum in an interval, analogous to the waywe used a sign change for bracketing solutions to nonlinear equations in one dimension.A real-valued function f is unimodal on an interval if there is a unique value x∗ in theinterval such that f (x∗ ) is the minimum of f on the interval, and f is strictly decreasing forx ≤ x∗ and strictly increasing for x∗ ≤ x.
The significance of this property is that it enablesus to refine an interval containing a solution by computing sample values of the functionwithin the interval and discarding portions of the interval according to the function valuesobtained, analogous to bisection for solving nonlinear equations.6.2.1Golden Section SearchSuppose f is a real-valued function that is unimodal on the interval [a, b]. Let x1 and x2be two points within the interval, with x1 < x2 . Comparing the function values f (x1 ) andf (x2 ) and using the unimodality property will enable us to discard a subinterval, either(x2 , b] or [a, x1 ), and know that the minimum of the function lies within the remainingsubinterval.
In particular, if f (x1 ) < f (x2 ), then the minimum cannot lie in the interval6.2. ONE-DIMENSIONAL OPTIMIZATION187(x2 , b], and if f (x1 ) > f (x2 ), then the minimum cannot lie in the interval [a, x1 ). Thus,we are left with a shorter interval, either [a, x2 ] or [x1 , b], within which we have alreadycomputed one function value, either f (x1 ) or f (x2 ), respectively. Hence, we will need tocompute only one new function evaluation to repeat this process.To make consistent progress in reducing the length of the interval containing the minimum, we would like for each new pair of points to have the same relationship with respectto the new interval that the previous pair had with respect to the previous interval.
Suchan arrangement will enable us to reduce the length of the interval by a fixed fraction at eachiteration, much as we reduced the length by half at each iteration of the bisection methodfor computing zeros of functions.To accomplish this objective, we choose√ the relative positions of the two points as τ and1 − τ , where τ 2 = 1 − τ , so that τ = ( 5 − 1)/2 ≈ 0.618 and 1 − τ ≈ 0.382. With thischoice, no matter which subinterval is retained, its length will be τ relative to the previousinterval, and the interior point retained will be at position either τ or 1 − τ relative to thenew interval. Thus, we need to compute only one new function value, at the complementarypoint, to continue the iteration.
This choice of sample points is called golden section search.The complete algorithm is as follows:Initial input: a function f , an interval [a, b] on which f is unimodal, and an errortolerance tol.√τ = ( 5 − 1)/2......................x1 = a + (1 − τ )(b − a)............ .... ......
......f1 = f (x1 ).... ........ ....... ... ....x2 = a + τ (b − a)...... ........ ........... ...............................f2 = f (x2 ).....................•.. ..................•...................................... ....while ((b − a) > tol) do......................................................................................................................................................................................................................x1x2abif (f1 > f2 ) thena = x1x1 = x2τf1 = f2|................................|....................|................................|x1 x2abx2 = a + τ (b − a)↑↑f2 = f (x2 )else|....................................................|................................|....................................................|x1x2ab = x2bx2 = x1↓↓1−τf2 = f1|................................|....................|................................|x1 = a + (1 − τ )(b − a)x1 x2abf1 = f (x1 )endendGolden section search is safe but slowly convergent.
Specifically, it is linearly convergent,with r = 1 and C ≈ 0.618.Example 6.2 Golden Section Search. We illustrate golden section search by using itto minimize the function2f (x) = 0.5 − xe−x .188CHAPTER 6. OPTIMIZATIONStarting with the initial interval [0, 2], we evaluate the function at points x1 = 0.764 andx2 = 1.236, obtaining f1 = 0.074 and f2 = 0.232. Since f1 < f2 , we know that the minimummust lie in the interval [a, x2 ], and thus we may replace b by x2 and repeat the process.The first iteration is depicted in Fig. 6.2, and the full sequence of iterations is given next.x10.7640.4720.7640.6520.5840.6520.6950.6790.6950.705f10.0740.1220.0740.0740.0850.0740.0710.0720.0710.071x21.2360.7640.9440.7640.6520.6950.7210.6950.7050.711f20.2320.0740.1130.0740.0740.0710.0710.0710.0710.071..........
....................... .................. ...................................................................................................................................................................... .................. ..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................••0x1x22Figure 6.2: First iteration of golden section search for example problem.Although unimodality plays a role in optimization similar to that played by a signchange in root finding, there are important practical differences. A sign change bracketsa root of an equation regardless of how large the bracketing interval may be.
The sameis true of unimodality, but in practice most functions cannot be expected to be unimodalunless the interval is reasonably close to a minimum. Thus, rather more trial and error maybe required to find a suitable starting interval for optimization than that typically requiredfor root finding. In practice one might simply look for three points such that the value ofthe objective function is lower at the inner point than at the two outer points. Althoughgolden section search always converges, it is not guaranteed to find the global minimum, oreven a local minimum, unless the objective function is unimodal on the starting interval.6.2.2Successive Parabolic InterpolationLike bisection for solving nonlinear equations, golden section search makes no use of thenumerical function values other than to compare them, so one might conjecture that makinggreater use of the function values would lead to faster methods.
Indeed, as in solvingnonlinear equations, faster convergence can be attained by replacing the objective functionlocally by a simple function that matches its values at some sample points.6.2. ONE-DIMENSIONAL OPTIMIZATION189An example of this approach is successive parabolic interpolation. Initially, the functionis evaluated at three points and a quadratic polynomial is fit to the three resulting values.The minimum of the parabola, if it has one, is taken to be a new estimate for the minimumof the function.