Volume 5, Number 4 - Return-On-Investment from Software Process Improvement
Examining the measures needed for handling Software Process Improvement (SPI) programs, we find several types of information needs:
The third category of needs is often handled by building a business case to show a return on investment (ROI). Measures used in the business case need to show an ROI in terms of the business concerns of the organization. When an organization has quantified business goals, perhaps organized in a balanced scorecard, determining an appropriate measure for the ROI on SPI is relatively easy to do. When the organization does not already have such a quantitative understanding of its business goals, an ROI argument may need to leverage industry data using a benchmarking approach like the ones described here.
When generating a business case for any project, we consider the reasons for doing a project and try to quantify the benefits expected from that project. Benefits from a SPI project might include:
We then collect best estimates of our investments (costs), which for SPI projects tend to be
Using this information, a classic ROI analysis would be computed as shown in Formula 1. ROI Capital
In many projects, including most SPI projects, the computation generally used is illustrated in Formula 2. ROI from SPI.
For an organization that can measure its costs and knows its current baseline values for benefits, building the business case and showing an ROI is feasible. However, for an organization without such baseline data, establishing an ROI argument for building a SPI program is considerably more difficult, and the organization may try to use experiences of other organizations to justify their program.
There is a great deal of anecdotal evidence from SPI programs that have succeeded over the last several decades, with information such as:
Deciding which of these values to use when justifying a particular SPI program is problematic, since it is difficult to tell which relates well to the type of work being done in this organization. Which can truly be treated as an industry benchmark value?
To use benchmark measurement data to justify a SPI program, the measures must be applicable across industries and across organizations of different types and sizes. Which of the SPI benefits cited in business cases are reasonable subjects for ROI industry benchmarks? Which of the cost categories are comparable? That is, which can be normalized in a way to allow us to compare current organizational performance to that of other organizations? Table 1 identifies candidate benefits and costs and how well they appear to work as benchmark items.
It appears that improvements to measures of productivity may be a useful candidate for an industry benchmark to justify and to explain the ROI of a SPI program. Others that appear reasonable are measures of product quality and savings in operations costs.
Benefit | Yes/No | Comment |
Increased revenue | No | Values vary broadly by market type and product type; revenue/employee is a normalized measure, but not easily compared outside a given domain |
Reduced cycle time | No | Interactions with processes and tools used, as well as specific type of product, make this difficult to compare |
Reduced cost of operations | No/Yes | Basis of costs vary so widely by type of industry and culture, that comparison is very difficult. For comparable operations, such as corporate IT spending, there are usable benchmarks. |
Level of quality | Yes/No | If the quality level is established through a standard test of product performance, a recognized level of quality can be assigned. Otherwise, the methods used by different organizations and their different users are unlikely to be comparable. |
Reduced rework | Yes/No | Percent of effort spent can be normalized, although the categories of effort that contribute to a rework figure can vary widely across organizations because of their processes and cultures |
Increased productivity | Yes | Level of productivity (using function points to measure amount of work) works well |
Cost | Yes/No | Comment |
Costs of the program | Yes | Total costs (labor, training, specialty services, tools, travel, etc.) are quite comparable, and they can be normalized by number of people in the organization benefiting from the SPI program |
Recent work on updates to the COCOMO model also suggest that productivity is a good measure of the impact of SPI [1]. A new scale factor (PMAT) appears in the COCOMO II model, showing the benefit of process maturity on an estimate of effort for a software project. Based on an analysis of 161 data points in the COCOMO II database, Boehm and his team found a statistically significant correlation between improvements in productivity and reductions in software project effort. Table 2 shows how going from level 2 to level 3 in CMM-based process maturity affected the productivity of teams working on different-sized systems. While not a benchmark in a strict sense, this set of data provides a good indicator of the value of improving an organization's processes.
Project Type | Typical Size | % Productivity Improvement |
Small | 10 KSLOC* | 4% |
Medium | 100 KSLOC | 7% |
Large | 2000 KSLOC | 11% |
Other sources of benchmark data on productivity include the publicly available ISBSG collection of function point data, and the work of a number of consulting organizations that provide benchmark services. (Several of these are described in the September-October, 2001 issue of IEEE Software, which is a focus issue on benchmarking.)
One such industry benchmark is the Application Development (AD) Benchmark done by Gartner Inc. The AD benchmark allows an organization to compare itself to other information technology organizations throughout the industry or within its own industry segment, looking at data about how it builds and maintains software systems. As of early 2002, the Gartner database included information from
The AD benchmark gathers size and effort (labor) data at the application and/or project level, allowing for analysis of technical and performance data at a low level. Productivity figures for a given organization can be calculated at the application or project level, or productivity data can be aggregated at a higher level.
The benchmark also asks respondents to identify their generic life cycle as one of waterfall or prototyping, and to rate their development process rigor as one of:
The levels of rigor do not map directly to process maturity levels, but they do indicate increasing levels of process discipline. This benchmark evolves with market needs and is being updated to ask more detailed questions of life cycles and to gather CMM-based process maturity information where it exists.
Meanwhile, it is interesting to examine the data in the current database for relationships between productivity and process maturity, to see what support there might be for an ROI for SPI.
Using the data from the last two years, Figure 1 shows the productivity in function points developed per full time equivalent developer (FP/FTE) for the different types of lifecycle, by level of rigor. It also shows the aggregate across type of life cycle. The aggregate data across lifecycle types shows that productivity rises as process rigor increases. The prototyping lifecycle supports this trend, with the greatest increase when going from loose to moderate rigor. The waterfall cycle, however, shows the lowest productivity at the moderate level of rigor. This pattern appears counterintuitive, in light of other industry information.
Looking at the waterfall data further, removing outliers, the pattern changes a bit, with the moderate and rigorous levels of rigor having about the same productivity, as shown in Figure 2. While this tempers the apparent anomaly, it still leaves a question of why such a lower productivity at moderate or rigorous level, when compared to the loose level of rigor. (As a reminder, the data used here is not process maturity data, but a general characterization of process rigor.)
Another study may help us understand why such a negative trend in productivity appears in the Gartner data. A study done by Harter, Krishnan, and Slaughter [2] shows a similar effect. In a study of 30 software products (a COBOL MRP system) built by a large IT firm over a period of 12 years (1984-1996), they found that increases in process maturity were associated with increases in development effort (that is, decreases in productivity). They also found that the increases in process maturity were associated with improvements in product quality. Investigating the interaction of these changes, they found that for the full product life cycle, the impact of improved quality outweighed the decreased development productivity, because of its positive effect on the long term maintenance work. Thus, there were reductions in overall cycle time and effort because of the improvements in product quality. The diagrams in Figure 3 show some of the relationships found in their study.
Thus, we conjecture that the data in the Gartner database also portrays this same reduction in productivity during development, as more process rigor is applied. However, the Gartner data does not include product quality data with which we can investigate the impact of improved rigor on product quality -or on the long-term life cycle impact of that product quality. To get this information will require other modifications to the AD benchmark. That work is underway.
If organizations are to use industry data to make their ROI arguments we clearly need to have access to benchmark data that is easily comparable across many types of organizations doing their work in different ways. Today's benchmarks provide the hint that productivity data is useful, but needs to be considered along with measures of product quality. As indicated in Table 1, quality data will need to be provided in a consistent way to be useful in a benchmark. Research is needed in how to collect consistent product quality data from customers, as well as from review and testing activities of development organizations.
Are there other measures that could be used, perhaps measures already in common benchmarks? We invite your comments and suggestions, as we continue our search.
Robert Solon is Research Director for Application Development (AD) Measurement with Gartner, Inc. He is responsible for the full range of Gartner' AD measurement services worldwide. Prior to joining Gartner, Bob served quality assurance and compliance with Roche Diagnostics and as a software project manager for Keane Inc., He also served in several staff and development positions with the software organization of the Defense Finance and Accounting Service.
Mr. Solon holds a BA in computer science and political science from Capital University, and an MBA from Anderson University.
Dr. Joyce Statz is Vice President of Knowledge Management at TeraQuest, where she helps employees and client organizations with process improvement programs She also coordinates development of product and service offerings of TeraQuest.
She has 15 years of experience software systems at Texas Instruments. Prior to that, she taught computer science at Bowling Green State University. She is a founder of the Software Quality Institute of The University of Texas at Austin (SQI).
Author Contact Information |
|
---|---|
Robert F. Solon Jr. Phone: (317) 237-4039 |
Joyce Statz Phone: (512) 219-9152 |
![]() |
![]() |
![]() |