Nash - Scientific Computing with PCs (523165), страница 44
Текст из файла (страница 44)
The graph is truncated when offscale.126Copyright © 1984, 1994 J C & M M NashNash Information Services Inc., 1975 Bel Air Drive, Ottawa, ON K2C 0X1 CanadaSCIENTIFIC COMPUTING WITH PCsCopy for:Dr. Dobb’s JournalFigure 14.5.4Graph of the log of the sum of squares of the deviations between the cash flowcoefficients and their reconstructions from the computed polynomial roots for the problemdescribed in Figure 14.2.1b.Figure 14.5.5a) Solution of the internal rate of return problem 14.2.1b by the SOLVE function ofDERIVE.c:=[-64000,275200,-299520,-189696,419136,41376,-218528,-23056,47190,11979]ply:=SUM(ELEMENT(c,i)*(1+r)^(1-i),i,1,10)(275200*r^8+1902080*r^7+5419264*r^6+8402240*r^5+8072416*r^4~+5272416*r^3+2331216*r^2+573462*r+64081)/(r+1)^9-64000r=1/10r=1/2r=-3/2Figure 14.5.5b) Solution of the internal rate of return problem 14.2.1a by a Newton iteration definedin DERIVE.
Approximate arithmetic is used.c:=[-600,300,250,600,-2000,0,455,666,777]ply:=SUM(ELEMENT(c,i)*(1+r)^(1-i),i,1,9)NEWTON(u,x,x0,n):=ITERATES(x-u/DIF(u,x),x,x0,n)NEWTON(ply,r,0)[0,0.0545808,0.0757515,0.078245,0.0782755,0.0782755]14: EXAMPLE: INTERNAL RATE OF RETURN PROBLEM12714.6 AssessmentThe mathematically obvious solution of the IRR problem in terms of polynomial roots led us to annoyingdetail both in assessing the validity and admissibility of roots and providing appropriate machineconstants for our compiler.
If one wants to know all the roots, MATLAB or a similar tool seems a betterchoice. DERIVE seems to win the contest on simplicity, though it will not always compute all the roots.Of course, when we draw a graph of NPV versus discount rate, our eyes only pick out distinct roots. Wecan confirm these roots by computing the NPV directly in a spreadsheet or with DERIVE. DERIVE alsohas the Newton iteration if we want very precise roots.If IRR values must be supplied automatically, a rootfinder that computes a single root in a given intervalis our choice.
We would not recommend programming this in the macro language of a spreadsheet.Having tried this, we believe it is too easy to make small but damaging errors placing data in thespreadsheet so the macros can work on it. Instead, we would choose a FORTRAN version of Nash J C(1990d) Algorithm 18 and graphs drawn with VG (Kahaner and Anderson, 1990) so only a reasonablysmall driver program must be coded.In solving the example problems here, we brought to light some deficiencies of built-in functions inspreadsheets.
Users should run check calculations of functions in any package — errors can and do creepin, so we must protect ourselves as best we can.Previous Home Next128Copyright © 1984, 1994 J C & M M NashSCIENTIFIC COMPUTING WITH PCsNash Information Services Inc., 1975 Bel Air Drive, Ottawa, ON K2C 0X1 CanadaCopy for:Dr. Dobb’s JournalChapter 15Data Fitting and Modelling15.115.215.315.415.515.6Problem StatementFormulationsMethods and AlgorithmsPrograms or PackagesSome solutionsAssessmentThis case study concerns fitting or modelling of data by single equation models. This topic encompassesa large class of practical scientific and statistical calculations.
For example, we use weed infestation toillustrate growth function modelling, but the same techniques are used for market penetration, bacterialinfection or similar data modelling problems.15.1 Problem StatementWe consider several data fitting problems of different types. They result in similar formulations but nosingle solution method is appropriate to all.
We looked briefly at modelling data with a linear model inSection 12.1 under a least squares criterion of fit. We now allow more general forms of model that canapproximate data for which "straight line" functions are inappropriate. Let us consider several examples:•An attempt to relate farm incomes to the use of phosphate fertilizer and petroleum products (NashJ C, 1990d, p. 46);•The refinement of land survey height differences between different surface locations on a piece of land(Nash J C, 1990d, p.
240);•The approximation of the number of weeds per square meter measured each year for 12 years by agrowth curve function.These general statements have to be cast into mathematical form. Data is given in Table 15.1.1.Table 15.1.1Data for three data fitting problems.a) Data for relating farm income with phosphate and petroleum useb) Height differences between benchmarksc) Growth data recorded at yearly intervalsa) Indices for phosphate use,petroleum use and farm incomephosphate petroleum income262221305291222342294221331302218339320217354350218369386218378401225368446228405492230438510237438534235451559236485b) Measured height differencesin meters between 4 benchmarklocations.HeightHeightHeightHeightHeight12312-HeightHeightHeightHeightHeight23434= -99.99= -21.03= 24.98=-121.02=3.99c) Hobbs data for the logisticgrowth function problem(weeds/sq.m)5.3087.249.63812.86617.06923.19231.44338.55850.15662.94875.99591.97215: DATA FITTING AND MODELLING12915.2 FormulationsWe write our set of data(15.2.1){yi, Xi1, Xi2, .
. . Xik }fori = 1, 2, . . . , mwhere y is the vector of data we wish to "explain" or fit and the matrix X holds in its columns data forthe variables that we anticipate will help to predict y. Then suppose we wish to approximate y by afunction g of the data X and parameters b to give a residual vector r having elements (note that theseare functions of b as well as the data y, X)(15.2.2)ri = g(b,xiT ) - yiwhere(15.2.3)xiT = ( Xi1, Xi2, .
. . , Xik )That is, we have a matrix X of data holding k variables in its columns. Thus X is m by k. The i’th rowof X will be called xiT. The function g() is our model. To adjust the parameters b to fit the model to thedata, we need a criterion of fit or loss function that allows us to decide if one set of parameters is betterthan another. The least squares fit is found by finding the parameters bj , j = 1, 2, . . .
,n that minimizethe loss function(15.2.4)S(b) = S(b1, b2, . . . , bn ) =ri (b)2Here we use functions ri(b), i = 1, 2, . . . ,m, that all have the same algebraic form, that of a residual,though we have used a "model - data" form that is the negative of common statistical practice. We do thisso the derivatives of r with respect to b are the same as those of g.
In general, we need not have the samedefinitions for all the functions. They could even be algorithmic expressions.There are other choices than least squares for the measure of how good the "fit" is between the modellingfunction and the data. Weighted least squares minimizes the sum(15.2.5)S(b) =wi ri (b)2where the vector w defines the weights. These weights are larger for data values yi we are sure of andsmaller for those we consider uncertain.
Maximum likelihood estimation depends on our knowing theprobability distribution or density function for the residuals. If the probability we observe a given residualis P(ri(b)), then the probability or likelihood we observe all the residuals simultaneously is(15.2.6)L(b) =Our loss function is thenP(ri (b))-L, or more usually -log(L), the negative of the log likelihood.Some workers have reasoned that occasional very large residuals are generally the result of outliers.
Weshould not give these observations the same weight as the rest. Such ideas produce special loss functionsthat look like least squares for small residuals but use a constant penalty for all "large" residuals (Goodall,1983). The results are considered robust to the presence of outliers.The notation y, X and b corresponds to that most commonly used by statisticians.
However, it shouldbe mentioned that it is common to set all elements of one of the columns of X to 1 (the constant) and toconsider the (k - 1) = p other columns as the variables. Different notations are used in other branches130Copyright © 1984, 1994 J C & M M NashSCIENTIFIC COMPUTING WITH PCsNash Information Services Inc., 1975 Bel Air Drive, Ottawa, ON K2C 0X1 CanadaCopy for:Dr.
Dobb’s Journalof science. While "trivial" in the mathematical sense, they cause users much annoyance. In particular, forlinear models, the p variables X.1, X.2, . . . X.p give rise to (p+1) parameters to be determined.The farm income problem is like this, but the height difference data is not, as we shall see below.
Modelswithout a constant (intercept) term are uncommon in statistical applications. In both cases we can write,(15.2.7)g(b) = X bUsing a matrix notation and writing the residual vector in its usual form(15.2.8)r = y-Xbthe sum of squares loss function can be written compactly as(15.2.9)S = S(b) = rT rNote that this is equivalent to substituting (15.2.7) in (15.2.2).For the height difference problem, we have a matrix(15.2.10)10010-110010-11-10X00-10-1Note that this has not constant column, and that one combination of benchmarks is missing. We couldalso have replicated measurements. More important for finding an appropriate method, however, is therealization that we can set an arbitrary level as our "zero" or "sea-level" of height.
That is, any constantcan be added to all the bi and not change the height differences. This indeterminacy in the solutions willshow up as singularity of the matrix X and may cause software to fail in mid-calculation. Because oneof the heights is arbitrary, we can set one of them to zero. For example, setting b4 zero lets us use thetriangular structure of the first 3 rows of X to get initial estimates of the heights. There are then two otherrows of information to be reconciled with these estimates. It is this reconciliation that is the core of thisproblem.The model suggested for the weed growth data is called a logistic growth function, an S shaped curve.This variant uses three parameters b1, b2, and b3 to approximate the weeds per square meter in year i as(15.2.11)g(i) = b1 /(1 + b2 exp(-b3 i))For our proposed logistic growth function, we put the time (years) in column 1 of matrixcorresponding weed densities in y, writing the residual for the logistic function as(15.2.12)X and theri = b1 /(1 + b2 exp(-b3 Xi1 )) - yiThus, function (15.2.11) could be rewritten(15.2.13)gi (b,X) = b1 /(1 + b2 exp(-b3 Xi1 ))Since our problems are stated as a minimization, they can be solved in this way.