Tuesday, July 28, 2009

Measuring Value of Automation Tests

Value and purpose of test automation

The value of test automation is often described in terms of the cost benefits due to reduction in manual testing effort (and the resources needed thereof) and also their ability to give fast feedback. However, this is based on a key assumption that the automated tests are serving their primary purpose – to repeatedly, consistently, and quickly validate that the application is within the threshold of acceptable defects.

Since it is impossible to know most of the defects in an application without using it over a period of time (either by a manual testing team or by users in production), we will need statistical concepts and models to help us design and confirm that the automated tests are indeed serving their primary purpose.

Definitions

  

Manual Confirmation of Defects

 
  

Is a defect

Is not a defect

 

Automation Test Results

Failure / Positive

Defective code correctly identified as defective –Caught Defects (CD)

Good code wrongly identified as defective - Not A Defect (NAD)

(aka Type I Error / False Positive)

Positive Predictive Value – CD/(CD+NAD)

Pass / Negative

Defective code wrongly identified as good - Missed Defects (MD)

(aka Type II Error / False Negative)

Good code correctly identified as good - Eureka! (E)

 
  

Sensitivity - CD/(CD+MD)

  


 

The sensitivity of a test is the probability that it will identify a defect when used on defective component. A sensitivity of 100% means that the tests recognize all defects as such. Thus in a high sensitivity test, a pass result is used to rule out defects.

The positive predictive value of a test is the probability that a component is indeed defective when the test fails. Predictive values are inherently dependent upon the prevalence of defects.

The threshold of acceptable number of defects has to be traded-off with the cost of achieving such a threshold - test development costs, test maintenance costs, higher test run times, etc.

Tests will involve a trade-off between the acceptable number of defects missed (false negatives) and the acceptable number of "Not a Defect" (false positives).

E.g. In order to prevent hijacking, airport security has to screen all baggage for arms being carried into the airplane. This can be done by manually checking all the cabin baggage. This was briefly done for domestic flights in India. However, this is prone to human error, increasing the probability of Missed Defects / false negative. Note - NAD / false positive would be low in this case. How would this change if the manual check is replaced with metal detectors?

Hypothesis

The efficacy of automated tests should be measured by their sensitivity and the probability of Missed Defects / false negatives when the application is subjected to these tests.

Data from a project

  

  

Manual Confirmation of Defects

  

  

  

Is a defect

Is not a defect

  

Automation Test Results

Failure / Positive

58

20

74%

Pass / Negative

113

Good code correctly identified as good - Eureka! (E)

  

  

  

34%

  

  

1 comment:

  1. Thanks for sharing the post..Automation Testing is the best practice for doing the best practices in testing..It makes your system best verified from the bugs..

    Automatisation Tests

    ReplyDelete