A. Wood - Softare Reliability. Growth Models (798489), страница 3
Текст из файла (страница 3)
1 .2 Defect DataAt Tandem, potential defect discoveries are recorded as Tandem Problem Reports (TPRs).TPRs are analyzed by software developers to determine if a new defect has been detected.TPRs are not always defects because they may represent a non-defect and because multiplepeople or groups may discover the same defect. Non-defect TPRs can represent confusion5about how to use a set of software or what the software is supposed to produce. TheseTPRs are not counted as defects and are usually closed with a notation that a question hasbeen answered or that the software performed as expected.
Defects that have been found bymultiple people or groups are usually called duplicates or rediscoveries. Rediscoveries arenot included in the defect counts since the original defect report is counted. TPRs that donot represent new defects (non-defects and rediscoveries) are called "smoke" by softwaredevelopers. The amount of smoke generated during QA test varies over time and byrelease, but 30-40% of all TPRs is a good estimate of the number of smoke TPRs.
Thelarge percentage of smoke TPRs is caused by significant parallel usage for the productsunder test, resulting in duplicate TPRs. The amount of smoke after the software is shippedto customers is usually higher since many different customers may encounter the samefailure.TPRs are classified as to the severity of the defect. The severities range from 3 (mostsevere) to 0 (least severe).
The severity levels are assigned depending on how urgently thecustomer needs a solution as shown below.Severityo123Customer ImpactNo Impact: Can tolerate the situation indefInitely.Minor:Can tolerate the situation, but expect solution eventually.Major:Can tolerate the situation, but not for long. Solution needed.Critical:Intolerable situation. Solution urgently needed.Only severity 2 and 3 defect data are used for the software reliability growth models. Thisis because the models are based on QA testing, and test personnel usually only submitseverity 2 and 3 TPRs because severity 0 and 1 TPRs do not usually impact their testing.Therefore, it would not be possible to predict severity 0 and 1 defect rates based on the testdata.Defect only TPRs (no smoke) represent the number of unique defects in the code and arethus the appropriate data to use in software reliability growth models.
However, it is usefulto model total TPRs as a surrogate for defects because it takes time to analyze a TPR anddetermine if the TPR is a new defect or smoke. Our estimates are that 50% of the TPRs areanalyzed within 1 week and 90% are analyzed within 2 weeks. Therefore, reliable defectdata lags total TPR data by about 2 weeks. If we are trying to make a decision aboutshipping the software, we want to use the most current data, not data that is 2 weeks old,so a model based on TPRs is valuable if it provides a reasonable prediction for residualdefects. Most of this report describes models that were developed using software defectdata. However, in Section 3.4, we describe the results of modeling TPRs instead of uniquedefects.2.1.3 Grouped DataThe best possible data would be a list of the failure occurrence times, where time may meancalendar time, execution time, or test case number.
Unfortunately, we are only able togather weekly or "grouped" data, that is, we know the amount of failures and test time thatoccurred during a week. TPRs have a time stamp that indicates when they are fIled, but QApersonnel sometimes batch their work and may not submit the TPRs found during a weekof testing until the end of the week. Therefore, the TPR time stamps are unreliable on adaily basis but are reliable on a weekly basis.
We have done experiments by randomizingour weekly data to create exact failure occurrence times, and it appears that the modelresults are the same for either grouped data or exact failure occurrence times (see Section3.7).62.2Software Reliability Growth Model TypesSoftware reliability growth models have been grouped into two classes of models concave l and S-shaped.
These two model types are shown in Figure 2-2. The mostimportant thing about both models is that they have the same asymptotic behavior, i.e., thedefect detection rate decreases as the number of defects detected (and repaired) increases,and the total number of defects detected asymptotically approaches a finite value. Thetheory for this asymptotic behavior is that:(1) A finite amount of code should have a finite number of defects. Repair and newfunctionality may introduce new defects, which increases the original finite number ofdefects. Some models explicitly account for new defect introduction during test whileothers assume they are negligible or handled by the statistical fit of the softwarereliability growth model to the data.(2) It is assumed that the defect detection rate is proportional to the number of defects in thecode.
Each time a defect is repaired, there are fewer total defects in the code, so thedefect detection rate decreases as the number of defects detected (and repaired)increases. The concave model strictly follows this pattern. In the S-shaped model, it isassumed that the early testing is not as efficient as later testing, so there is a ramp-upperiod during which the defect detection rate increases. This could be a goodassumption if the first QA tests are simply repeating tests that developers have alreadyrun or if early QA tests uncover defects in other products that prevent QA from findingdefects in the product being tested. For example, an application test may uncover asdefects that need to be corrected before the application can be run.
Application testhours are accumulated, but defect data is minimal because as defects don't count aspart of the application test data. After the as defects are corrected, the remainder of theapplication test data (after the inflection point in the S-shaped curve) looks like theconcave model.NumberofDefectsNumberofDefectsConcaveS-ShapedTest TimeTest TimeFigure 2-2. Concave and S-Shaped ModelsThere are many different representations of software reliability models.
In this paper weuse the model representation shown in Figure 2-2. This representation shows the expectednumber of defects at time t and is denoted Jl(t), where t can be calendar time, executiontime, or number of tests executed as described in Section 2.1. An example equation for Jl(t)is the Goel-Okumoto (G-O) model:1The word concave is used for this class of models because they are all concave functions, i.e.,continually bending downward. Functions that bend upward are called convex functions.
Sshaped functions are first convex and then concave.7J.1(t) = a(l-e-bt), wherea = expected total number of defects in the codeb = shape factor = the rate at which the failure rate decreases, i.e., the rate at whichwe approach the total number of defects.The Goel-Okumoto model is a concave model, and the parameter "a" would be plotted asthe total number of defects in Figure 2-2.
The Goel-Okumoto model has 2 parameters;other models can have 3 or more parameters. For most models, J.L(t) = aF(t), where a is theexpected total number of defects in the code and F(t) is a cumulative distribution function.Note that F(O) = 0, so no defects are discovered before the test starts, and F( 00) = 1, soJ.1( 00) = a and a is the total number of defects discovered after an infinite amount of testing.Table 2-1 provides a list of the models that were evaluated as part of this effort. Aderivation of the properties of most of these models can be found in [Musa,87].Model Name Model Type J.L(t)Goel-OkuConcavea(l_e- bt)moto (G-O)a;:::O,b>OG-OSS-Shapeda( 1-(1+bt)e-bt)Shapeda;:::O,b>OReferenceCommentsGoel,79Also called Musa model orexponential modelModification of G-O modelto make it S-shaped(Gamma function instead ofexponential)Solves a technical conditionwith the G-O model.Becomes same as G-O as capproaches O.Used by Fujitsu, NumazuWorksYamada,83HossainDahiya/G-OConcavea( l-e-bt)/( 1+ce-bt)a;:::O,b>O,c>OHossain,93GompertzS-Shapedta(bc )Kececioglu,91ParetoConcavea~O,Og,:::;I,O<c<1a(l-(l +t/~)l-aa;:::O,~>O,O:::;a:::; 1WeibullConcaveLittlewood,81Musa,87a(l_e-btC)Assumes failures havedifferent failure rates andfailures with highest ratesremoved firstSame as G-O for c=1a~O,b>O,c>OYamadaExponentialConcaveYamadaRaleighS-ShapedLog PoissonInfmiteFailurea( l-e-ra( l-e-~t»)Yamada,86Attempts to account fortesting effortYamada,86Attempts to account fortesting effortMusa,87Failure rate decreases butdoes not approach 0a;:::O,ra>O,~>Oa(l-e -ra( l-e (-~t2/2) »)a;:::O,ra>O,~>O(l/c)ln(cat+l)c>O,a>OTable 2-1.
Software Reliability Growth Model ExamplesThe Log Poisson model is a different type of model. This model assumes that the code hasan infinite number of failures. Although this is not theoretically true, it may be essentiallytrue in practice since all the defects are never found before the code is rewritten, and themodel may provide a good fit for the useful life of the product.8The models all make assumptions about testing and defect repair. Some of theseassumptions seem very reasonable, but some are questionable.
Table 2-2 contains a list anddiscussion of these assumptions.AssumptionDefects are repairedimmediately whenthey are discoveredDefect repair isperfectNo new code isintroduced duringQA testDefects are onlyreported by theproduct testinggroupEach unit of time(calendar,execution, numberof test cases) isequivalentTests representoperational profIleFailures areindependentRealityDefects are not repaired immediately, but this can be partiallyaccommodated by not counting duplicates.