Tuesday, August 25, 2009

States don't run businesses

If we could think of software development as an economy, team members would be businesses and project manager the state.

Let me explain the analogy.
1. Businesses demand laissez faire - they don't want state intervention in the running of the economy. Much the same as software development teams.
2. Over the past years, we have seen how businesses can run the system aground, left to themselves. Its not that businesses like to run systems aground. Its just that businesses tend to focus much more on their profits than the welfare of the system. Same with the software development teams.
3. Enter macroeconomics. There are times when the state has to intervene. This could be reactive, when faced with a recession. Or preventive. In either case, it is necessary that the state monitors the economy and businesses and intervenes in a timely and appropriate manner not to suppress enterprise (which is key to a free market economy), but to foster economic stability and growth. Can't state the responsibility of project managers better than that.
4. As somebody said - state should not do more (or less) of what businesses are supposed to do but do what businesses don't do. States aren't expected to run businesses. States, however, are expected to understand the overall economy. Project managers don't need to write narratives or code. They should, however, have a sound understanding of the overall software development eco-system.

Note - I am not criticizing the software development teams as much as enunciating the need for continuous monitoring and timely and appropriate intervention of the project managers.

Tuesday, July 28, 2009

Measuring Value of Automation Tests

Value and purpose of test automation

The value of test automation is often described in terms of the cost benefits due to reduction in manual testing effort (and the resources needed thereof) and also their ability to give fast feedback. However, this is based on a key assumption that the automated tests are serving their primary purpose – to repeatedly, consistently, and quickly validate that the application is within the threshold of acceptable defects.

Since it is impossible to know most of the defects in an application without using it over a period of time (either by a manual testing team or by users in production), we will need statistical concepts and models to help us design and confirm that the automated tests are indeed serving their primary purpose.

Definitions

  

Manual Confirmation of Defects

 
  

Is a defect

Is not a defect

 

Automation Test Results

Failure / Positive

Defective code correctly identified as defective –Caught Defects (CD)

Good code wrongly identified as defective - Not A Defect (NAD)

(aka Type I Error / False Positive)

Positive Predictive Value – CD/(CD+NAD)

Pass / Negative

Defective code wrongly identified as good - Missed Defects (MD)

(aka Type II Error / False Negative)

Good code correctly identified as good - Eureka! (E)

 
  

Sensitivity - CD/(CD+MD)

  


 

The sensitivity of a test is the probability that it will identify a defect when used on defective component. A sensitivity of 100% means that the tests recognize all defects as such. Thus in a high sensitivity test, a pass result is used to rule out defects.

The positive predictive value of a test is the probability that a component is indeed defective when the test fails. Predictive values are inherently dependent upon the prevalence of defects.

The threshold of acceptable number of defects has to be traded-off with the cost of achieving such a threshold - test development costs, test maintenance costs, higher test run times, etc.

Tests will involve a trade-off between the acceptable number of defects missed (false negatives) and the acceptable number of "Not a Defect" (false positives).

E.g. In order to prevent hijacking, airport security has to screen all baggage for arms being carried into the airplane. This can be done by manually checking all the cabin baggage. This was briefly done for domestic flights in India. However, this is prone to human error, increasing the probability of Missed Defects / false negative. Note - NAD / false positive would be low in this case. How would this change if the manual check is replaced with metal detectors?

Hypothesis

The efficacy of automated tests should be measured by their sensitivity and the probability of Missed Defects / false negatives when the application is subjected to these tests.

Data from a project

  

  

Manual Confirmation of Defects

  

  

  

Is a defect

Is not a defect

  

Automation Test Results

Failure / Positive

58

20

74%

Pass / Negative

113

Good code correctly identified as good - Eureka! (E)

  

  

  

34%

  

  

Monday, July 13, 2009

Re-defining Agile concepts in a non-agile context

The metrics I suggested for use in an agile project will be equally valuable for a non-agile project as well. The terms / concepts used there-in have to be re-defined, though.

0. Story - A work component; Could be a use case, a functional requirement, etc.
1. Value estimates - Value of the work component (story / use case, etc.) towards enhancing the product. If this is not defined for the work components, it could be temporarily substituted with their effort estimates
2. Complexity estimates - Relative estimate of the complexity of the work component, relates to the effort needed for delivering the work component. This could be the effort estimates for the work component
3. Iteration - Time between 2 successive status reports (in projects that have a fortnightly status report, iteration will be a fortnight)
4. Status - status of the story. E.g. Analysis complete, Coding Complete, Testing Complete, etc.
5. Done Status for stories - This is the last tracked status in the life cycle of the work component. In agile projects, this is often "Showcase Complete" / "Customer Accepted".
6. Velocity - Sum of Value / Complexity estimates of all "Done" stories in an iteration

Thursday, July 9, 2009

Metrics for an Agile project

Q. How are we doing on delivering agreed scope of the current release?
A. Burn-up chart by iteration for the release. Below is a burn-up chart Manju created for reporting status on one of our large programs.



Among other things, this graph shows:
1. Scope changes (demonstrated by fluctuations on the "Total Scope" line
2. The gaps between succeeding status lines reflects in-process / wait stories. Larger than normally accepted gaps indicate bottlenecks. E.g. Dev is a bottleneck due to the huge gap between Analysis Complete and Dev Complete
3. Inventory of stories that are ready to go live (demonstrated by the "Showcase Passed" line
4. Actual completion status (demonstrated by the "Showcase passed" status line)

Q. How are we doing on throughput? How much value are we delivering? What is the trend - running faster, slowing down?
A. Velocity graph by iteration for the project. Only "Done" stories considered for velocity calculations. Below is a velocity graph Manju created for tracking velocity on one of our large programs. The 3 iteration average was first brought to my notice by Santosh, who was using it in one of his projects. I find this extremely valuable, as it balances the ups and downs into something like a trend line.

Why iterative development?

"Until you have seen some of the rest, you can't make sense of any part" - Marvin Minsky.

Minsky says this in the context of describing complex systems. This applies as much to software systems as to intelligence. How can we help users describe a complex system? Wouldn't building some of the rest help them in making sense of the parts.

Monday, July 6, 2009

Bottleneck - Cont.

Below is some data from my previous project:

Wait stages:
Ready for Dev 78
Ready for BA Acceptance 25
Ready for QA 14
Ready for Showcase 12
Ready for SAT 50

In-process stages:
In Analysis 71
In Dev 92
In QA 10

Its clear that Development is the bottleneck. Development takes the longest time among all the stages. Things just don't move as fast here. So, we push more work into this stage. That is the reason for the high in-process work. And more work means multi-tasking for the developers and consequently, diluted focus. That further adds to the time stories take to move out of this stage. And of course, you just cant push enough through the Development stage, so the inventory piles up. This could be the situation in most software development projects. The symptoms of this bottleneck sometimes showed a high inventory in other stages. But they could be traced back to the Development stage in most cases.

I wonder why we didn't look into the Development stage itself and saw what was happening WITHIN the stage. That could have helped us understand how to speed up the Development process.

Bottlenecks

My last project had a huge bottleneck at system acceptance test (SAT) stage - the SAT team was not able to sign-off stories at the same pace as the dev teams. Though I don't readily have the data, I am certain that the in-process time at the SAT stage was not high. Between the dev team completing the story and the SAT team picking it up were 2 steps - deployment into SAT Servers and showcasing these stories to SMEs and SAT team. Though we were deploying into SAT on a weekly basis, the showcases were done only once in 2 weeks.

There are 2 questions that bother me:
1. Is SAT the bottleneck? Is bottleneck identified purely by the inventory before that stage?
2. How can we ensure we manage the SAT process better?

My thoughts:
1. SAT is not the bottleneck here. If we define bottleneck as the number of stories that can be "processed" at a particular stage, given their ability to work full-time, SAT was not the bottleneck - as their ability to sign-off stories was high (demonstrated during the later iterations of the release). Even if we consider inventory as a measure, given that showcase is a mandatory step before SAT and showcase is done only once in 2 weeks, showcase could be the bottleneck. SAT is like the final assembly. What consumes time is not the assembly itself, but the wait for all parts to come through before they start.
2. I reckon one thing that can be done is: reduce the batch size. Do more showcases. Once a week maybe.

Wednesday, July 1, 2009

Prioritizing stories based on relative affordability

We use value delivered by the story as a primary measure of priority. Would it be more meaningful to consider Relative Value / Cost (relative affordability) as a formula for determining the priority of stories? Cost can be a relative measure of the complexity or effort for implementing and testing the story, determined by the development team (BAs and QAs) - in other words relative size of the story.

This relative affordability of a story gives us a measure of the worthiness of investing in a story. Judging purely based on value is inappropriate because the cost may be prohibitive.

Estimating value of technical stories

How can we determine the value of a "technical story"?

As discussed below in the case for relative value: The value of a story derives from making the system more attractive for end customers. And that could either be developing a new feature or making the system scale better.

For technical stories that fall under non functional requirements (requirements being the key word here), it should be possible to establish their value based on the above definition. An extreme position would be to de-prioritize any NFR / story that cannot establish how it would make the system more attractive for end customers. There is a possibility that inadequate articulation of the value may lower the perceived value of a tech story. But hey, I would ask the tech guys to articulate the value in a different and better way rather than let it get in the back door.

Note: Value need not necessarily be limited to returns in the short term. Long term benefits should also be considered to be of value, in some cases, more valuable than short term gains.

The concept of Relative Value

One of the interesting things that Chaman brought to my notice was details around TOC and that Throughput should be measured as sales (Rs / dollars).

Now, most people see Throughput in terms of story points that the dev teams estimate for stories. This is natural but inconsistent with the spirit of TOC. Story points, no matter that they are nebulous and a relative measure, are estimated by the people doing the job (devs, qas, etc.). And that firmly land them on the cost side. While it is meaningful to get this estimate, for the purpose of measuring throughput in the context of TOC, we may find "value" estimate by business more appropriate.

Now, given that the stories are not independent units but assimilate into a larger system which is then sold to end customers or used by end customers, the value of a story can't be defined with precision. The value of a story derives from making the system more attractive for end customers. As this is an abstraction of "value", its estimation becomes subjective to interpretation and nailing it down with accuracy difficult. Hence, relative value. My suggestion would be to use the standard sizing / estimation concepts for this. With some changes. Get the business to answer 2 questions:

1. By implementing this story, how much more direct (and indirect) revenue will the system generate? If this cannot be determined quantitatively, then ask the second question:
2. By implementing this story, how qualitatively useful is it making the system for end users / customers?

Use triangulation to ensure the relative accuracy of these numbers. This should enable us to measure relative value being delivered every iteration and hence throughput.

Tuesday, June 30, 2009

Why software development is not same as building bridges

I came across a very old post on itmweb titled: Why is software development not viewed as R&D, but more like manufacturing? (http://www.itmweb.com/ubb/Forum3/HTML/000010.html).

One of the replies to the post argues that software development is similar to engineering projects like building bridges. That view couldn't be more off.

Equating software development with building bridges doesn't make much sense to me. For one, building bridges needs a lot of upfront design because you cant change the bridge once built. Software continuously undergoes changes. There is very little that can't be changed / rewritten in software. I would also agree with diltondalton - in that designers and builders of a product are the same. Another way of saying that is that in software, the worker on the ground will have to continuously create as opposed to the very mechanical nature of the their equivalent's work when building bridges. In fact, you don't want the workers building bridges to think. Thats the designer's job.

I would also agree with the spirit of the original post. If you are pedantic, you would argue that it differs from R&D in a manufacturing company. But it most definitely is not the same as manufacturing. As we try and bring in more and more lean manufacturing practices into software development, that becomes all the more clear.

Monday, June 29, 2009

Caution when choosing Lean Manufacturing practices

When we picked up Toyota's lean practices for use in software development, we underestimated the differences between manufacturing assembly line (hereafter "the line") and software development. I have made an effort to list some of these here and emphasize why we should be cautious while choosing lean manufacturing practices into software development.
1. The line doesn't have any slack. You can't play AoE when the line is moving with units that need you to perform a task. By contrast, software development is asynchronous.
2. Work performed at different stages of the line are very small and always take the same effort - because work is done on a line that is moving. By contrast, work performed at different stages of software development runs into hours / days and varies based on the nature of the story in question
3. Stories though independent, are part of the whole (system). They can't exist independently nor be sold independent of the overall software system, unlike units manufactured on the line.
4. Chaman pointed out that software development is iterative in contrast to manufacturing.
5. The equivalent of software development in manufacturing is probably Product Research & Development (R&D) rather than the line itself.

I am sure there are many more differences. That said, we could as easily build a list of similarities. They key point is to be cognizant of the differences, and ensure we are cautious when choosing and implementing lean manufacturing practices in the software development context.