The art of software testing. Myers (2nd edition) (2004) (811502), страница 26
Текст из файла (страница 26)
Integration. Part of the test plan is a definition of how theprogram will be pieced together (for example, incrementaltop-down testing). A system containing major subsystemsor programs might be pieced together incrementally, usingthe top-down or bottom-up approach, for instance, butwhere the building blocks are programs or subsystems,rather than modules. If this is the case, a system integrationplan is necessary. The system integration plan defines theorder of integration, the functional capability of each version of the system, and responsibilities for producing “scaffolding,” code that simulates the function of nonexistentcomponents.10.
Tracking procedures. Means must be identified to track variousaspects of the testing progress, including the location oferror-prone modules and estimation of progress with respectto the schedule, resources, and completion criteria.11. Debugging procedures. Mechanisms must be defined for reporting detected errors, tracking the progress of corrections, andadding the corrections to the system. Schedules, responsibilities, tools, and computer time/resources also must be part ofthe debugging plan.12. Regression testing. Regression testing is performed after making a functional improvement or repair to the program.
Itspurpose is to determine whether the change has regressedother aspects of the program. It usually is performed byrerunning some subset of the program’s test cases. Regressiontesting is important because changes and error correctionstend to be much more error prone than the original programcode (in much the same way that most typographical errorsin newspapers are the result of last-minute editorial changes,rather than changes in the original copy).
A plan for regression testing—who, how, when—also is necessary.148The Art of Software TestingTest Completion CriteriaOne of the most difficult questions to answer when testing a programis determining when to stop, since there is no way of knowing if theerror just detected is the last remaining error. In fact, in anything buta small program, it is unreasonable to expect that all errors will eventually be detected. Given this dilemma, and given the fact that economics dictate that testing must eventually terminate, you mightwonder if the question has to be answered in a purely arbitrary way,or if there are some useful stopping criteria.The completion criteria typically used in practice are both meaningless and counterproductive. The two most common criteria arethese:1.
Stop when the scheduled time for testing expires.2. Stop when all the test cases execute without detecting errors;that is, stop when the test cases are unsuccessful.The first criterion is useless because you can satisfy it by doingabsolutely nothing. It does not measure the quality of the testing.The second criterion is equally useless because it also is independentof the quality of the test cases. Furthermore, it is counterproductivebecause it subconsciously encourages you to write test cases that havea low probability of detecting errors.As discussed in Chapter 2, humans are highly goal oriented.
If youare told that you have finished a task when the test cases are unsuccessful, you will subconsciously write test cases that lead to this goal,avoiding the useful, high-yield, destructive test cases.There are three categories of more useful criteria. The first category, but not the best, is to base completion on the use of specifictest-case-design methodologies. For instance, you might define thecompletion of module testing as the following:The test cases are derived from (1) satisfying the multicondition-coveragecriterion, and (2) a boundary-value analysis of the module interfacespecification, and all resultant test cases are eventually unsuccessful.Higher-Order Testing149You might define the function test as being complete when the following conditions are satisfied:The test cases are derived from (1) cause-effect graphing, (2) boundary-valueanalysis, and (3) error guessing, and all resultant test cases are eventuallyunsuccessful.Although this type of criterion is superior to the two mentionedearlier, it has three problems.
First, it is not helpful in a test phase inwhich specific methodologies are not available, such as the systemtest phase. Second, it is a subjective measurement, since there is noway to guarantee that a person has used a particular methodology,such as boundary-value analysis, properly and rigorously. Third,rather than setting a goal and then letting the tester choose the bestway of achieving it, it does the opposite; test-case-design methodologies are dictated, but no goal is given.
Hence, this type of criterion isuseful sometimes for some testing phases, but it should be appliedonly when the tester has proven his or her abilities in the past inapplying the test-case-design methodologies successfully.The second category of criteria—perhaps the most valuable one—is to state the completion requirements in positive terms. Since thegoal of testing is to find errors, why not make the completion criterion the detection of some predefined number of errors? For instance,you might state that a module test of a particular module is not complete until three errors are discovered.
Perhaps the completion criterion for a system test should be defined as the detection and repair of70 errors or an elapsed time of three months, whichever comes later.Notice that, although this type of criterion reinforces the definition of testing, it does have two problems, both of which are surmountable. One problem is determining how to obtain the numberof errors to be detected. Obtaining this number requires the following three estimates:1. An estimate of the total number of errors in the program.2. An estimate of what percentage of these errors can feasiblybe found through testing.150The Art of Software Testing3. An estimate of what fraction of the errors originated in particular design processes, and during what testing phases theseerrors are likely to be detected.You can get a rough estimate of the total number of errors in several ways. One method is to obtain them through experience withprevious programs.
Also, a variety of predictive modules exist. Someof these require you to test the program for some period of time,record the elapsed times between the detection of successive errors,and insert these times into parameters in a formula. Other modulesinvolve the seeding of known, but unpublicized, errors into the program, testing the program for a while, and then examining the ratioof detected seeded errors to detected unseeded errors. Anothermodel employs two independent test teams who test for a while,examine the errors found by each and the errors detected in common by both teams, and use these parameters to estimate the totalnumber of errors.
Another gross method to obtain this estimate is touse industry-wide averages. For instance, the number of errors thatexist in typical programs at the time that coding is completed (beforea code walkthrough or inspection is employed) is approximately fourto eight errors per 100 program statements.The second estimate from the preceding list (the percentage oferrors that can be feasibly found through testing) involves a somewhat arbitrary guess, taking into consideration the nature of the program and the consequences of undetected errors.Given the current paucity of information about how and whenerrors are made, the third estimate is the most difficult.
The data thatexist indicate that, in large programs, approximately 40 percent of theerrors are coding and logic-design mistakes, and the remainder aregenerated in the earlier design processes.To use this criterion, you must develop your own estimates that arepertinent to the program at hand.
A simple example is presented here.Assume we are about to begin testing a 10,000-statement program,the number of errors remaining after code inspections are performedis estimated at 5 per 100 statements, and we establish, as an objective,the detection of 98 percent of the coding and logic-design errors andHigher-Order Testing15195 percent of the design errors. The total number of errors is thus estimated at 500. Of the 500 errors, we assume that 200 are coding andlogic-design errors, and 300 are design flaws. Hence, the goal is tofind 196 coding and logic-design errors and 285 design errors.
Aplausible estimate of when the errors are likely to be detected is shownin Table 6.2.If we have scheduled four months for function testing and threemonths for system testing, the following three completion criteriamight be established:1. Module testing is complete when 130 errors are found andcorrected (65 percent of the estimated 200 coding and logicdesign errors).2. Function testing is complete when 240 errors (30 percent of200 plus 60 percent of 300) are found and corrected, orwhen four months of function testing have been completed,whichever occurs later.