A. Wood - Softare Reliability. Growth Models (798489), страница 5
Текст из файла (страница 5)
Theparameter a is then estimated using (3) and the estimate of b. Confidence intervals for a arecalculated using (3) and the values from (2).2.3.2 Classical Least SquaresThe maximum likelihood technique solves directly for the optimal parameter values. Theleast squares method solves for parameter values by picking the values that best fit a curveto the data.
This technique is generally considered to be the best for small to mediumsample sizes. The theory of least squares curve-fitting (see [Mood,74]) is that we want tofind parameter values that minimize the "difference" between the data and the functionfitting the data, where the difference is defined as the sum of the squared errors. Theclassical least squares technique involves log likelihood functions and is described in[Musa,87,Section 12.3]. From [Musa,87,Equation 12.141], the expression to beminimized for the G-O model is12(4)Lj =/In((fj-fj_l)/(~-tj_l))-ln(b) -In(a-fD)2, where, as in Section 2.3.1,w current number of weeks of QA test~ cumulative test time at the end of the ith weekfj cumulative number of failures at the end of the ith week.W===Confidence intervals are given by [Musa, p.
358] as:(5)a ± tw - 2,etJ2 (Var(a»O.5 wheretw - 2,aJ2 is the upper a/2 percentage point of the t distribution with w-2 degrees offreedomVar(a) is the variance of a; calculation of the variance is described in Appendix 1.The confidence interval for b is the same as the above equation with b replacing a. Theseconfidence intervals are derived by assuming that a and b are normally distributed. Notethat the confidence intervals are symmetric in contrast to the asymmetric confidenceintervals provided by maximum likelihood.2.3.3 Alternative Least SquaresAn alternative approach to least squares is to directly minimize the difference between theobserved number of failures and the predicted number of failures.
For this approach thequantity to be minimized is:(6)W»2,L j =/fj - J..1(tjwhere, as in Section 2.3.1,w current number of weeks of QA testJ..1(~) = the cumulative expected number of defects at time ~~ = cumulative test time at the end of the ith weekfj = cumulative number of failures at the end of the ith week.=This technique is easy to use for any software reliability growth model since theminimization can be done by an optimization package such as the Solver in Microsoft®Excel. It is not normally described in textbooks because it does not lead to a set ofequations that can be solved, but with the increased availability of optimization packages,the minimization can be solved directly instead of reducing it to a set of equations.
Note thatany of software reliability growth models from Table 2-1 can be used in this equation byusing the appropriate J..1(t). For the G-O model, Equation (6) becomes:Confidence intervals for the parameters in Equation (7) are the same as for classical leastsquares and are given by Equation (5). The calculations required for these confidenceintervals is described in Appendix 1.
Note that these confidence intervals are symmetric.2.3.4 Solution Techniques and HintsThe Solver in Microsoft® Excel was used to solve the minimizations defined in thepreceding sections. However, since these are non-linear equations, the solution found maynot be appropriate (a local optimum rather than a global optimum) or it may not be possibleto determine a solution in a reasonable amount of time.
To help avoid this problem, it is13useful to define parameter values that are close to the final values. This may require someexperimentation prior to running the optimization. If a solution has been obtained using theprevious week's data, those parameter values are usually a good starting point. If this is thefirst attempt to solve the minimization, parameter values should be selected that provide areasonable match to the existing data. This is easy to do in a spreadsheet with one columnof data and a second column of predicted values based on a given function and the chosenparameter values.Transforming the test hour data should not affect the total number of defects parameter.However, before the parameters become stable, transforming the test hour data may helpnumerical stability. For example, we used per cent of planned test hours completed ratherthan actual test hours completed in week 7 for Release 3 (one week before we began to getparameter stability).
Per cent test hours predicted 400 total defects while actual test hourdata predicted 5000 total defects. Neither answer is close to the right value of about 100,but using per cent test hours was closer, and the solution was reached much more quickly.2.3.5 Theoretical Comparison of TechniquesThis section compares the three parameter estimation techniques from a theoreticalperspective. We focus on their ease of use, confidence interval shape, and parameterscalability. The more important comparison of model stability and predictive ability onactual data is contained in Section 3.6.Since optimization packages are readily available, Equations (1) - maximum likelihood, (4)- classical least squares, and (6) - alternative least squares are all straightforward to solve.However, Equation (1) only applies to the G-O model, and a new maximum likelihoodequation must be derived for each software reliability growth model.
These equations canbe difficult to derive, especially for the more complex models. Equation (4) applies to theexponential family of models that includes the G-O model. It is fairly easy to modify thisequation for similar models. Equation (6) is the easiest to use since it applies to anysoftware reliability growth model, so the alternative least squares method is the easiest toapply.Confidence intervals for all of the estimation techniques are based on assuming thatestimation errors are normally distributed. For the maximum likelihood technique, thisassumption is good for large sample sizes because of the asymptotically normal propertiesof this estimator. However, it is not as good for the smaller samples that we typically have.Nevertheless, the maximum likelihood technique provides the best confidence intervalsbecause it requires less normality assumptions and because it provides asymmetricconfidence intervals for the total defect parameter.
The lower confidence limit is larger thanthe number of experienced defects, and the upper confidence limit is farther from the pointestimate than the lower confidence limit to represent the possibility that there could bemany defects that have gone undetected by testing. Conversely, for the least squarestechniques, the lower confidence limit can be less than the number of experienced defects(which is obviously impossible), and the confidence interval is symmetric. Also, additionalassumptions pertaining to the normality of the parameters is necessary to derive confidenceintervals for the least squares techniques.The transformation technique consists of multiplying the test time by an arbitrary (butconvenient) constant and multiplying the number of defects observed by a differentarbitrary constant.
For this technique to work, the predicted number of total defects must beunaffected by the test time scaling and must scale the by same amount as the defect data.For example, we may experience 50 total defects during test and want to scale that to 100for confidentiality or ease of reporting. To do that transformation, the number of defects14reported each week must be multiplied by 2. If 75 total defects were predicted by a modelbased on the unscaled data, then the total defects predicted from the scaled data should be150. Fortunately, all three of the parameter estimation techniques provide this linear scalingproperty as shown in Appendix 2.
In addition the least squares confidence intervals scalelinearly as shown in Appendix 2, but the maximum likelihood confidence intervals do not.2.4Definition of a Useful ModelSince none of the models will match any company's software development and test processexactly, what makes the model useful? The answer to this question relates to what we wantthe model to do. During the test, we would like the model to predict the additional testeffort required to achieve a quality level (as measured by number of remaining defects) thatwe deem suitable for customer use. At the end of the test, we would like the model topredict the number of remaining defects that will be experienced (as failures) in field usage.This leads to two criteria for a useful model: (1) The model must become stable during thetest period, i.e., the predicted number of total defects should not vary significantly fromweek to week, and (2) The model must provide a reasonably accurate prediction of thenumber of defects that will be discovered in field use.(1) The model must become stable during the test period and remain stable until the end ofthe test (assuming the test process remains stable).If the model says that there are 50 remaining defects one week and 200 the next, noone is going to believe either prediction.
For a model to be accepted by management, thepredicted number of total defects should not vary significantly from week to week. Stabilityis subjective, but in our experience a good rule of thumb is that weekly predictions from themodel should vary by no more than 10%. Also, the confidence intervals around the totaldefect parameter should be shrinking. It would be nice if the model was immediately stable,but parameter estimation requires a reasonable amount of data.
In particular, the data mustbegin to show concave behavior since the speed at which the failure rate decreases is criticalto estimating the total number of defects in the code. The literature (e.g.,[Musa,87,P.194,P.311] and [Ehrlich,90,P.63]) and our experience indicate that the modelparameters do not become stable until about 60% of the way through the test. This issufficient since management will not be closely monitoring the model until near the end ofexpected test completion.(2) The model must provide a reasonably accurate prediction of the number of defects thatwill be discovered in field use.Since field use is very different from a test environment, no model derived from thetest environment can expect to be perfectly accurate.
However, if the number of defects iswithin the 90% confidence levels developed from the model, the model is reasonablyaccurate. Unfortunately, the range defined by the 90% confidence levels may be muchlarger than software development managers would like. In our experience, 90% confidenceintervals are often larger than twice the predicted residual defects.3.0Model ApplicationsOver the past few years, we have collected test data from a subset of products for foursoftware releases.