Press, Teukolsly, Vetterling, Flannery - Numerical Recipes in C (523184), страница 77
Текст из файла (страница 77)
Let us do sonow. Then the variance of the estimator is,1 Vara (f )Varb (f )Var f =+(7.8.13)4NaN − NaVar (f ) =which is minimized (one can easily verify) whenNaσa=Nσa + σb(7.8.14)Here we have adopted the shorthand notation σa ≡ [Vara (f )]1/2 , and correspondingly for b.If Na satisfies equation (7.8.14), then equation (7.8.13) reduces to,- (σa + σb )2Var f =(7.8.15)4NEquation (7.8.15) reduces to equation (7.8.9) if Var (f ) = Vara (f ) = Varb (f ), in which casestratifying the sample makes no difference.A standard way to generalize the above result is to consider the volume V divided intomore than two equal subregions.
One can readily obtain the result that the optimal allocation ofsample points among the regions is to have the number of points in each region j proportionalto σj (that is, the square root of the variance of the function f in that subregion). In spacesof high dimensionality (say d >∼ 4) this is not in practice very useful, however. Dividing avolume into K segments along each dimension implies K d subvolumes, typically much toolarge a number when one contemplates estimating all the corresponding σj ’s.Mixed StrategiesImportance sampling and stratified sampling seem, at first sight, inconsistent with eachother. The former concentrates sample points where the magnitude of the integrand |f | islargest, that latter where the variance of f is largest. How can both be right?The answer is that (like so much else in life) it all depends on what you know and howwell you know it.
Importance sampling depends on already knowing some approximation toyour integral, so that you are able to generate random points xi with the desired probabilitydensity p. To the extent that your p is not ideal, you are left with an error that decreasesonly as N −1/2 . Things are particularly bad if your p is far from ideal in a region where theintegrand f is changing rapidly, since then the sampled function h = f /p will have a largevariance. Importance sampling works by smoothing the values of the sampled function h,and is effective only to the extent that you succeed in this.Stratified sampling, by contrast, does not necessarily require that you know anythingabout f .
Stratified sampling works by smoothing out the fluctuations of the number of pointsin subregions, not by smoothing the values of the points. The simplest stratified strategy,dividing V into N equal subregions and choosing one point randomly in each subregion,already gives a method whose error decreases asymptotically as N −1 , much faster thanN −1/2 .
(Note that quasi-random numbers, §7.7, are another way of smoothing fluctuations inthe density of points, giving nearly as good a result as the “blind” stratification strategy.)However, “asymptotically” is an important caveat: For example, if the integrand isnegligible in all but a single subregion, then the resulting one-sample integration is all but7.8 Adaptive and Recursive Monte Carlo Methods319useless. Information, even very crude, allowing importance sampling to put many points inthe active subregion would be much better than blind stratified sampling.Stratified sampling really comes into its own if you have some way of estimating thevariances, so that you can put unequal numbers of points in different subregions, according to(7.8.14) or its generalizations, and if you can find a way of dividing a region into a practicalnumber of subregions (notably not K d with large dimension d), while yet significantlyreducing the variance of the function in each subregion compared to its variance in the fullvolume.
Doing this requires a lot of knowledge about f , though different knowledge fromwhat is required for importance sampling.In practice, importance sampling and stratified sampling are not incompatible. In many,if not most, cases of interest, the integrand f is small everywhere in V except for a smallfractional volume of “active regions.” In these regions the magnitude of |f | and the standarddeviation σ = [Var (f )]1/2 are comparable in size, so both techniques will give about thesame concentration of points.
In more sophisticated implementations, it is also possible to“nest” the two techniques, so that (e.g.) importance sampling on a crude grid is followedby stratification within each grid cell.Adaptive Monte Carlo: VEGASThe VEGAS algorithm, invented by Peter Lepage [1,2] , is widely used for multidimensional integrals that occur in elementary particle physics. VEGAS is primarily based onimportance sampling, but it also does some stratified sampling if the dimension d is smallenough to avoid K d explosion (specifically, if (K/2)d < N/2, with N the number of samplepoints).
The basic technique for importance sampling in VEGAS is to construct, adaptively,a multidimensional weight function g that is separable,p ∝ g(x, y, z, . . .) = gx (x)gy (y)gz (z) . . .(7.8.16)dSuch a function avoids the K explosion in two ways: (i) It can be stored in the computeras d separate one-dimensional functions, each defined by K tabulated values, say — so thatK × d replaces K d. (ii) It can be sampled as a probability density by consecutively samplingthe d one-dimensional functions to obtain coordinate vector components (x, y, z, . . .).The optimal separable weight function can be shown to be [1]1/2&&f 2 (x, y, z, .
. .)gx (x) ∝dy dz . . .(7.8.17)gy (y)gz (z) . . .(and correspondingly for y, z, . . .). Notice that this reduces to g ∝ |f | (7.8.6) in onedimension. Equation (7.8.17) immediately suggests VEGAS’ adaptive strategy: Given aset of g-functions (initially all constant, say), one samples the function f , accumulating notonly the overall estimator of the integral, but also the Kd estimators (K subdivisions of theindependent variable in each of d dimensions) of the right-hand side of equation (7.8.17).These then determine improved g functions for the next iteration.When the integrand f is concentrated in one, or at most a few, regions in d-space, thenthe weight function g’s quickly become large at coordinate values that are the projections ofthese regions onto the coordinate axes.
The accuracy of the Monte Carlo integration is thenenormously enhanced over what simple Monte Carlo would give.The weakness of VEGAS is the obvious one: To the extent that the projection of thefunction f onto individual coordinate directions is uniform, VEGAS gives no concentrationof sample points in those dimensions. The worst case for VEGAS, e.g., is an integrand thatis concentrated close to a body diagonal line, e.g., one from (0, 0, 0, . .
.) to (1, 1, 1, . . .).Since this geometry is completely nonseparable, VEGAS can give no advantage at all. Moregenerally, VEGAS may not do well when the integrand is concentrated in one-dimensional(or higher) curved trajectories (or hypersurfaces), unless these happen to be oriented closeto the coordinate directions.The routine vegas that follows is essentially Lepage’s standard version, minimallymodified to conform to our conventions. (We thank Lepage for permission to reproduce theprogram here.) For consistency with other versions of the VEGAS algorithm in circulation,320Chapter 7.Random Numberswe have preserved original variable names.
The parameter NDMX is what we have called K,the maximum number of increments along each axis; MXDIM is the maximum value of d; someother parameters are explained in the comments.The vegas routine performs m = itmx statistically independent evaluations of thedesired integral, each with N = ncall function evaluations. While statistically independent,these iterations do assist each other, since each one is used to refine the sampling grid forthe next one.
The results of all iterations are combined into a single best answer, and itsestimated error, by the relations5 m−1/2mm 1 1IiIbest =σbest =(7.8.18)σ2σ2σ2i=1 ii=1 ii=1 iAlso returned is the quantityχ2 /m ≡m1 (Ii − Ibest)2m − 1 i=1σi2(7.8.19)If this is significantly larger than 1, then the results of the iterations are statisticallyinconsistent, and the answers are suspect.The input flag init can be used to advantage. One might have a call with init=0,ncall=1000, itmx=5 immediately followed by a call with init=1, ncall=100000, itmx=1.The effect would be to develop a sampling grid over 5 iterations of a small number of samples,then to do a single high accuracy integration on the optimized grid.Note that the user-supplied integrand function, fxn, has an argument wgt in additionto the expected evaluation point x.
In most applications you ignore wgt inside the function.Occasionally, however, you may want to integrate some additional function or functions alongwith the principal function f . The integral of any such function g can be estimated byIg =wi g(x)(7.8.20)iwhere the wi ’s and x’s are the arguments wgt and x, respectively. It is straightforward toaccumulate this sum inside your function fxn, and to pass the answer back to your mainprogram via global variables. Of course, g(x) had better resemble the principal function f tosome degree, since the sampling will be optimized for f .#include <stdio.h>#include <math.h>#include "nrutil.h"#define ALPH 1.5#define NDMX 50#define MXDIM 10#define TINY 1.0e-30extern long idum;For random number initialization in main.void vegas(float regn[], int ndim, float (*fxn)(float [], float), int init,unsigned long ncall, int itmx, int nprn, float *tgral, float *sd,float *chi2a)Performs Monte Carlo integration of a user-supplied ndim-dimensional function fxn over arectangular volume specified by regn[1..2*ndim], a vector consisting of ndim “lower left”coordinates of the region followed by ndim “upper right” coordinates.
The integration consistsof itmx iterations, each with approximately ncall calls to the function. After each iterationthe grid is refined; more than 5 or 10 iterations are rarely useful. The input flag init signalswhether this call is a new start, or a subsequent call for additional iterations (see commentsbelow). The input flag nprn (normally 0) controls the amount of diagnostic output. Returnedanswers are tgral (the best estimate of the integral), sd (its standard deviation), and chi2a(χ2 per degree of freedom, an indicator of whether consistent results are being obtained). Seetext for further details.{float ran2(long *idum);7.8 Adaptive and Recursive Monte Carlo Methods321void rebin(float rc, int nd, float r[], float xin[], float xi[]);static int i,it,j,k,mds,nd,ndo,ng,npg,ia[MXDIM+1],kg[MXDIM+1];static float calls,dv2g,dxg,f,f2,f2b,fb,rc,ti,tsi,wgt,xjac,xn,xnd,xo;static float d[NDMX+1][MXDIM+1],di[NDMX+1][MXDIM+1],dt[MXDIM+1],dx[MXDIM+1], r[NDMX+1],x[MXDIM+1],xi[MXDIM+1][NDMX+1],xin[NDMX+1];static double schi,si,swgt;Best make everything static, allowing restarts.if (init <= 0) {Normal entry.