A. Wood - Softare Reliability. Growth Models (798489), страница 7
Текст из файла (страница 7)
This occurs because many simple automated tests that do not take much executiontime are run early in the test phase. Again, the prediction was unstable and did not matchthe field results. Similar results with the other releases indicates that execution time is amuch better measure of the amount of testing than number of test cases in our environment.TestWeek123456789101112Execution PercentNo. ofHoursExecution TestHoursCases3%16267110%1,92049914%2,1507151,13723%3,1121,79936%3,8022,43848%5,0092,81856%6,4433,57471%7,6304,2349,26384%4,6809,69093%4,95598%9,9345,053100% 10,000PercentTestCases7%19%22%31%38%50%64%76%93%97%99%100%No. of Predicted Total Predicted TotalDefects No.
of Defects- No. of DefectsExecution Time Test Cases69132028404854163No Prediction20457107591529313760878413261Table 3-8. Release 3 Results for Number of Test Cases213.4Results From Modeling Problem Reports.Our results from using problem reports instead of defects showed that problem reports arean excellent surrogate for defects. These results are shown in Tables 3-9 and 3-10 forReleases 2 and 3.TestWeekNo.
ofTPRs12345678910Predicted TotalNo. of TPRs193149627184101123142151159169175183185188191193195111213141516171819No. ofDefectsPredicted TotalNo. of DefectsPredicted TotalNo. of DefectsBased on TPRs1331528928828427827928528528227827827818263440486175848995100104110112114117118120203192179178184184183182183184193177177174170171175175173171170171Table 3-9. Release 2 Results for TPRsTestWeek123456789101112No. ofTPRsPredicted TotalNo. of TPRs8122235476275848994100101No. ofDefectsPredicted TotalNo. of Defects6913202840485457596061213159143145147Table 3-10.
Release 3 Results for TPRs22163107938784Predicted TotalNo. of DefectsBased on TPRs12795868788The predictions based on TPRs become stable earlier than predictions based on defectsbecause there is more data. We used the ratio of TPRs to defects to predict a total numberof defects from the TPR model. This ratio is usually about 60% (recall that the other 40%of TPRs are mainly rediscoveries caused by parallel usage of the products under test). Asan example, for Release 2 the ratio of defects to TPRs was 120/195 = 62%. The predictednumber of TPRs is 278 at Week 19 of Release 2 testing.
Taking 62% of 278 yields 171,which is reasonably close to the final prediction of 184 from the defect model.During system test, we would use the results from a few weeks preceding the current testweek to predict a ratio of defects to TPRs and then use this ratio and the current test weekTPR prediction to predict expected defects. The results for Release 3 show that, despite achange in the defect to TPR ratio from 64% in Week 9 to 60% in Week 12, this techniquestill provides a reasonable prediction of residual defects.3.5Results for Different ModelsWe fit all the different software reliability growth models described in Section 2.3 to thedata shown in Table 3-1. The results for Release 1 are shown in Table 3-11. The numbersin the table show the predicted number of total defects for each model at various times inthe test process.
Note that most of the models become reasonably stable at about the sametime as the G-O model but that their predictions of the total number of defects aresignificantly different than the G-O model. The S-shaped models (G-O S-shaped,Gompertz, Yamada Raleigh) all tended to under predict the total defects. This is expectedsince the data has the shape of a concave model rather than an S-shaped model. The otherconcave models (Pareto, Yamada Exponential) all tended to over predict the number of totaldefects.
The models that are variants of the G-O model (Hossain-Dahiya/G-O and Weibull)both predicted exactly the same parameters as the G-O model. The Log-Poisson model isan infinite failure model and does not have a parameter that predicts that total number ofdefects. To estimate the total number of defects from this model, we estimated the time atwhich the G-O model would have found 90% of the residual defects and then determinedthe number of failures that the Log-Poisson model would have predicted at that point intime.
The relatively good performance of the Log-Poisson model may be the result of thisartificial total defect estimation technique. Our conclusion from these results is that the G-Omodel was significantly better for our data than the other models.Total Defects predicted several weeks after RQA10 Weeks 12 Weeks 14 Weeks 17 Weeks20 WeeksModel Name133Goel-Oku moto (G-O)11612913998G-O S-Shaped10271829199114Gompertz11296110107Yamada Raleigh107111778998631462Pareto757833735220213Yamada Exponential152181204Hossain-Dahiya/G-OAll results same as G-O modelWeibullAll results same as G-O model140153166160Log Poisson161There were 134 total defects found for Release 1, 100 m QA test, 34 after QA testTable 3-11.
Release 1 Results for Various Models233 .6Different Correlation TechniquesThroughout this paper we have presented results obtained using the alternative least squarestechnique described in Section 2.3.3. Table 3-12 shows the results obtained with the othertwo statistical techniques for all the releases. For Release 1, the alternative least squarestechnique is more stable than the other two techniques. The standard least squarestechnique requires that the number of defects change each week because the weekly changeis used as the denominator of an equation, so we were unable to solve for the parametersusing this technique in weeks 19 and 20. For Releases 2 and 4, the alternative least squarestechnique appears to be slightly more stable than the maximum likelihood technique.
ForRelease 3, the maximum likelihood technique appears to be slightly more stable than thealternative least squares technique. However, the differences between these two techniquesdo not appear to be significant. The standard least squares technique appears to be veryunstable in some cases, e.g., weeks 16-18 of Release 1, week 19 of Release 2, and week18 of Release 4. Since the alternative least squares technique is the easiest to use, is slightlymore stable, and correlates slightly better to the results from field data, it is our preferredtechnique.TestWk1234567891011121314151617181920Release 1Release 2Release 3Release 4Def- ML LS LS* Def- ML LSLS* Def- ML LSLS* Def- ML LS LS*ectsectsects Eects16136124189327261383420339414028114948401654614819587554 124 163 14225845786 10787276975 11389 158 203 16359298579937698 13965 8481 122 107 14995 162 192 16449607687723243 5361367884783286 134 116 169 100 153 179 15238 444890 144 123 188 104 166 178 1703646 455793 146 129 183 110 191 184 2063851 46613996 148 129 181 112 179 184 17654 4898 150 134 182 114 172 183 165553952 48416699 142 139 148 117 174 182 16857 50100 133 138 126 118 179 183 1884263 51 325100 126 1354262 52** 120 188 184 217**100 122 133*** ClaSSIcal LS technIque** Couldn't solve because the number of defects didn't change from previous weekTable 3-12.
Statistical Technique Comparison24Table 3-13 shows the confidence interval results for Release 4. As discussed in Section2.3.5, the maximum likelihood confidence intervals are asymmetric while the least squaresconfidence intervals are symmetric. Unfortunately, the maximum likelihood confidenceintervals are very wide. The confidence interval range is more than three times larger thepredicted residual defects (range at week 19 is 106-51=65 and predicted residual defects are62-42=20). The classical least squares lower confidence limit is less than the defectsexperienced, which obviously cannot be true. The confidence intervals derived from thealternative least squares technique are very small - the confidence intervals for weeks 12-14do not even include the [mal total defect point estimate.
Since these did not seem credible, asecond technique based on the Poisson distribution (described in Appendix 1) was used todetermine confidence intervals. These confidence seem a little more reasonable but have thesame problem of the lower confidence limit being less than the defects experienced. Noneof these confidence intervals seems very satisfactory although the maximum likelihoodconfidence intervals are the most credible.MLResultsClassical LS ResultsAlternative LS Results**No.
of TotalTestLower Upper TotalLowerUpper TotalLower UpperWeek Defects Defects 5%CL 95%CL Defects 5%CL 95%CL Defects 5%CL 95%CL1132433649207853 46 41 60 65681232383546314244 39 33 50 553613464045 40 34 50 5636654826701451449446 42 35 51 57387857201554468248 43 36 52 5939612110016457448 44 37 53 6039525532781741574888662211050 46 38 54 6118-3,1963,84651 47 39 55 6342635111132552 48 40 56 6419426251106**Couldn't solve because the number of defects dIdn't change from prevIous week**First set of confidence limits, based on t distribution, from Equation (5) in Section2.3.2.