Value and purpose of test automation
The value of test automation is often described in terms of the cost benefits due to reduction in manual testing effort (and the resources needed thereof) and also their ability to give fast feedback. However, this is based on a key assumption that the automated tests are serving their primary purpose – to repeatedly, consistently, and quickly validate that the application is within the threshold of acceptable defects. 
Since it is impossible to know most of the defects in an application without using it over a period of time (either by a manual testing team or by users in production), we will need statistical concepts and models to help us design and confirm that the automated tests are indeed serving their primary purpose.
Definitions Manual Confirmation of Defects Is a defect Is not a defect Automation Test Results Failure / Positive Defective code correctly identified as defective –Caught Defects (CD) Good code wrongly identified as defective - Not A Defect (NAD) (aka Type I Error / False Positive) Positive Predictive Value – CD/(CD+NAD) Pass / Negative Defective code wrongly identified as good - Missed Defects (MD) (aka Type II Error / False Negative) Good code correctly identified as good - Eureka! (E) Sensitivity - CD/(CD+MD)                      
 
The sensitivity of a test is the probability that it will identify a defect when used on defective component. A sensitivity of 100% means that the tests recognize all defects as such. Thus in a high sensitivity test, a pass result is used to rule out defects.
The positive predictive value of a test is the probability that a component is indeed defective when the test fails. Predictive values are inherently dependent upon the prevalence of defects.
The threshold of acceptable number of defects has to be traded-off with the cost of achieving such a threshold - test development costs, test maintenance costs, higher test run times, etc.
Tests will involve a trade-off between the acceptable number of defects missed (false negatives) and the acceptable number of "Not a Defect" (false positives).
E.g. In order to prevent hijacking, airport security has to screen all baggage for arms being carried into the airplane. This can be done by manually checking all the cabin baggage. This was briefly done for domestic flights in India. However, this is prone to human error, increasing the probability of Missed Defects / false negative. Note - NAD / false positive would be low in this case. How would this change if the manual check is replaced with metal detectors?
Hypothesis
The efficacy of automated tests should be measured by their sensitivity and the probability of Missed Defects / false negatives when the application is subjected to these tests.
Data from a project
| 
 | 
 | Manual Confirmation of Defects | 
 | |
| 
 | 
 | Is a defect | Is not a defect | 
 | 
| Automation Test Results | Failure / Positive | 58 | 20 | 74% | 
| Pass / Negative | 113 | Good code correctly identified as good - Eureka! (E) | 
 | |
| 
 | 
 | 34% | 
 | 
 | 
 

