STN 5-4: Benchmarking the ROI for SPI

Volume 5, Number 4 - Return-On-Investment from Software Process Improvement

Benchmarking the ROI for Software Process Improvement (SPI)

Some new thoughts on an old problem.

Robert Solon Jr., Gartner Inc. and Joyce Stratz, TeraQuest Metrics Inc.

1. Introduction

Examining the measures needed for handling Software Process Improvement (SPI) programs, we find several types of information needs:

Ways to determine readiness for a SPI program (identifying risks and the level of threat to SPI)
Information to track and manage the SPI program effectively as a project or set of projects (similar to progress and process measures for other types of projects)

Information to justify the program (at the start, and throughout the program)

The third category of needs is often handled by building a business case to show a return on investment (ROI). Measures used in the business case need to show an ROI in terms of the business concerns of the organization. When an organization has quantified business goals, perhaps organized in a balanced scorecard, determining an appropriate measure for the ROI on SPI is relatively easy to do. When the organization does not already have such a quantitative understanding of its business goals, an ROI argument may need to leverage industry data using a benchmarking approach like the ones described here.

2. Business Cases and ROI

When generating a business case for any project, we consider the reasons for doing a project and try to quantify the benefits expected from that project. Benefits from a SPI project might include:

New revenues from new market or product capabilities
More revenue from additional business due to improved customer satisfaction
More revenue from additional business due to improved product quality
More revenue from current product (lines) because of faster cycle time, introducing new features more quickly to current or new customers
Reduced costs from reductions in rework
Reduced costs of operations
Reduced costs (or additional revenue) from increased productivity

We then collect best estimates of our investments (costs), which for SPI projects tend to be

Effort (labor) invested in performing the project work
Travel and administrative costs
l Training
Specialty services, such as assessments and other consulting support

Using this information, a classic ROI analysis would be computed as shown in Formula 1. ROI Capital

Return on Invested Capital = Earnings before Interest and Taxes/Average Invested Capital

Formula 1. ROI Capital

In many projects, including most SPI projects, the computation generally used is illustrated in Formula 2. ROI from SPI.

Return on SPI Program = (Program Benefits - Program Cost)/Program Cost

Formula 2. ROI from SPI

3. Challenges with Demonstrating ROI for SPI

For an organization that can measure its costs and knows its current baseline values for benefits, building the business case and showing an ROI is feasible. However, for an organization without such baseline data, establishing an ROI argument for building a SPI program is considerably more difficult, and the organization may try to use experiences of other organizations to justify their program.

There is a great deal of anecdotal evidence from SPI programs that have succeeded over the last several decades, with information such as:

ROI values reported from 4 to 70 x cost (median about 5 to 1)
Various benefits based on organization goals: productivity, defect levels, cost, schedule attainment, effort spent, customer satisfaction, staff attitude
Costs per individual in the organization varying from $200 to $2500

Deciding which of these values to use when justifying a particular SPI program is problematic, since it is difficult to tell which relates well to the type of work being done in this organization. Which can truly be treated as an industry benchmark value?

4. Benchmarks for ROI

To use benchmark measurement data to justify a SPI program, the measures must be applicable across industries and across organizations of different types and sizes. Which of the SPI benefits cited in business cases are reasonable subjects for ROI industry benchmarks? Which of the cost categories are comparable? That is, which can be normalized in a way to allow us to compare current organizational performance to that of other organizations? Table 1 identifies candidate benefits and costs and how well they appear to work as benchmark items.

It appears that improvements to measures of productivity may be a useful candidate for an industry benchmark to justify and to explain the ROI of a SPI program. Others that appear reasonable are measures of product quality and savings in operations costs.

Table 1: Candidate Benchmark Items

Benefit	Yes/No	Comment
Increased revenue	No	Values vary broadly by market type and product type; revenue/employee is a normalized measure, but not easily compared outside a given domain
Reduced cycle time	No	Interactions with processes and tools used, as well as specific type of product, make this difficult to compare
Reduced cost of operations	No/Yes	Basis of costs vary so widely by type of industry and culture, that comparison is very difficult. For comparable operations, such as corporate IT spending, there are usable benchmarks.
Level of quality	Yes/No	If the quality level is established through a standard test of product performance, a recognized level of quality can be assigned. Otherwise, the methods used by different organizations and their different users are unlikely to be comparable.
Reduced rework	Yes/No	Percent of effort spent can be normalized, although the categories of effort that contribute to a rework figure can vary widely across organizations because of their processes and cultures
Increased productivity	Yes	Level of productivity (using function points to measure amount of work) works well
Cost	Yes/No	Comment
Costs of the program	Yes	Total costs (labor, training, specialty services, tools, travel, etc.) are quite comparable, and they can be normalized by number of people in the organization benefiting from the SPI program

Recent work on updates to the COCOMO model also suggest that productivity is a good measure of the impact of SPI [1]. A new scale factor (PMAT) appears in the COCOMO II model, showing the benefit of process maturity on an estimate of effort for a software project. Based on an analysis of 161 data points in the COCOMO II database, Boehm and his team found a statistically significant correlation between improvements in productivity and reductions in software project effort. Table 2 shows how going from level 2 to level 3 in CMM-based process maturity affected the productivity of teams working on different-sized systems. While not a benchmark in a strict sense, this set of data provides a good indicator of the value of improving an organization's processes.

Table 2. Benefits of Process Maturity Productivity in COCOMO II

Project Type	Typical Size	% Productivity Improvement
Small	10 KSLOC*	4%
Medium	100 KSLOC	7%
Large	2000 KSLOC	11%

Other sources of benchmark data on productivity include the publicly available ISBSG collection of function point data, and the work of a number of consulting organizations that provide benchmark services. (Several of these are described in the September-October, 2001 issue of IEEE Software, which is a focus issue on benchmarking.)

One such industry benchmark is the Application Development (AD) Benchmark done by Gartner Inc. The AD benchmark allows an organization to compare itself to other information technology organizations throughout the industry or within its own industry segment, looking at data about how it builds and maintains software systems. As of early 2002, the Gartner database included information from

43,700 development projects, 44,616,000 Function Points
55,700 supported applications, 124,588,000 Function Points
all major technologies, languages, databases
project data from ~1991

The AD benchmark gathers size and effort (labor) data at the application and/or project level, allowing for analysis of technical and performance data at a low level. Productivity figures for a given organization can be calculated at the application or project level, or productivity data can be aggregated at a higher level.

The benchmark also asks respondents to identify their generic life cycle as one of waterfall or prototyping, and to rate their development process rigor as one of:

Loose: informally followed; little or no documentation
Moderate: checkpoints at major phase boundaries; responsibility rests with project managers; little or no external oversight
Rigorous: extensive documenta-tion; independent oversight/quality assurance/process management tools often used

The levels of rigor do not map directly to process maturity levels, but they do indicate increasing levels of process discipline. This benchmark evolves with market needs and is being updated to ask more detailed questions of life cycles and to gather CMM-based process maturity information where it exists.

Meanwhile, it is interesting to examine the data in the current database for relationships between productivity and process maturity, to see what support there might be for an ROI for SPI.

Using the data from the last two years, Figure 1 shows the productivity in function points developed per full time equivalent developer (FP/FTE) for the different types of lifecycle, by level of rigor. It also shows the aggregate across type of life cycle. The aggregate data across lifecycle types shows that productivity rises as process rigor increases. The prototyping lifecycle supports this trend, with the greatest increase when going from loose to moderate rigor. The waterfall cycle, however, shows the lowest productivity at the moderate level of rigor. This pattern appears counterintuitive, in light of other industry information.

Figure 1: Productivity by Lifecycle and Rigor

Looking at the waterfall data further, removing outliers, the pattern changes a bit, with the moderate and rigorous levels of rigor having about the same productivity, as shown in Figure 2. While this tempers the apparent anomaly, it still leaves a question of why such a lower productivity at moderate or rigorous level, when compared to the loose level of rigor. (As a reminder, the data used here is not process maturity data, but a general characterization of process rigor.)

Figure 2: Effects of Rigor with Waterfall Cycle

Another study may help us understand why such a negative trend in productivity appears in the Gartner data. A study done by Harter, Krishnan, and Slaughter [2] shows a similar effect. In a study of 30 software products (a COBOL MRP system) built by a large IT firm over a period of 12 years (1984-1996), they found that increases in process maturity were associated with increases in development effort (that is, decreases in productivity). They also found that the increases in process maturity were associated with improvements in product quality. Investigating the interaction of these changes, they found that for the full product life cycle, the impact of improved quality outweighed the decreased development productivity, because of its positive effect on the long term maintenance work. Thus, there were reductions in overall cycle time and effort because of the improvements in product quality. The diagrams in Figure 3 show some of the relationships found in their study.

Figure 3: Process Maturity Affects Quality, Effort, Cycle Time

Thus, we conjecture that the data in the Gartner database also portrays this same reduction in productivity during development, as more process rigor is applied. However, the Gartner data does not include product quality data with which we can investigate the impact of improved rigor on product quality -or on the long-term life cycle impact of that product quality. To get this information will require other modifications to the AD benchmark. That work is underway.

5. Future of Bench-marking for ROI

If organizations are to use industry data to make their ROI arguments we clearly need to have access to benchmark data that is easily comparable across many types of organizations doing their work in different ways. Today's benchmarks provide the hint that productivity data is useful, but needs to be considered along with measures of product quality. As indicated in Table 1, quality data will need to be provided in a consistent way to be useful in a benchmark. Research is needed in how to collect consistent product quality data from customers, as well as from review and testing activities of development organizations.

Are there other measures that could be used, perhaps measures already in common benchmarks? We invite your comments and suggestions, as we continue our search.

About the Authors

Robert Solon is Research Director for Application Development (AD) Measurement with Gartner, Inc. He is responsible for the full range of Gartner' AD measurement services worldwide. Prior to joining Gartner, Bob served quality assurance and compliance with Roche Diagnostics and as a software project manager for Keane Inc., He also served in several staff and development positions with the software organization of the Defense Finance and Accounting Service.

Mr. Solon holds a BA in computer science and political science from Capital University, and an MBA from Anderson University.

Dr. Joyce Statz is Vice President of Knowledge Management at TeraQuest, where she helps employees and client organizations with process improvement programs She also coordinates development of product and service offerings of TeraQuest.

She has 15 years of experience software systems at Texas Instruments. Prior to that, she taught computer science at Bowling Green State University. She is a founder of the Software Quality Institute of The University of Texas at Austin (SQI).

Author Contact Information
Robert F. Solon Jr. Gartner, Inc. 3836 North Drexel Avenue Indianapolis, IN 46226 Phone: (317) 237-4039 Fax: (317) 237-4072 E-mail: [email protected] URL: www.gartner.com	Joyce Statz TeraQuest Metrics, Inc. 12885 Research Blvd, Suite 207 Austin, TX 78750 Phone: (512) 219-9152 Fax: (512) 219-0587 E-mail: [email protected] URL: www.teraquest.com

Author Contact Information

Robert F. Solon Jr.
Gartner, Inc.
3836 North Drexel Avenue
Indianapolis, IN 46226

Phone: (317) 237-4039
Fax: (317) 237-4072
E-mail: [email protected]
URL: www.gartner.com

Joyce Statz
TeraQuest Metrics, Inc.
12885 Research Blvd, Suite 207
Austin, TX 78750

Phone: (512) 219-9152
Fax: (512) 219-0587
E-mail: [email protected]
URL: www.teraquest.com