The Measurement Challenge of High Maturity
By Kevin Domzalski, BAE Systems
With assistance from David Card, Q-Labs
Introduction
CMMI Level 5 is far too often erroneously thought of as an idyllic state of perfection. Much of the hard work it takes to get there, and stay there, and the many experiences gained along the way may be forgotten in the afterglow of success. Many of the most difficult obstacles on the road to high maturity are related to measurement and analysis.
The purposes of this article are to
• Discuss some of the measurement problems and difficulties that are encountered on the road to CMMI Level 5
• Describe some of the measurement techniques that proved especially useful to us
• Outline some of our plans for going forward, recognizing that Level 5 is not the destination, only the end of the beginning
Let’s start with reviewing how we got to where we are today.
Our History with CMM/CMMI
In 1989, BAE Systems National Security Solutions, then a division of General Dynamics Corp., took our first steps on the path to high maturity practices by undertaking an assessment using the Carnegie Mellon University Software Engineering Institute’s (SEI’s) Software Capability Maturity Model (CMM®) and rating ourselves at Level-1. Over the next six years, the organization advanced at a reasonable pace, achieving Software CMM Level-2 in 1992 and Level-3 in 1995. The organization then expanded its scope beyond software engineering and, in 1997, achieved Level-2 in both the Systems Engineering (SE) and People CMMs. In 2000, Systems Engineering was assessed at Level-3 (SE-CMM).
Also in 2000, our organization confronted its first major process improvement setback when it attempted and failed to be assessed at Software CMM Level-4. (You can get a good feeling for the true maturity of an organization by watching them deal with failure.) The 2000 assessment results were, to say the least, eye opening.
The general improvement opportunities that were identified focused on the following:
• An expansive (and “too complex”) process documentation set
• A general lack of stability and control of our measurements, at both the project and organizational levels, and
• A weak understanding of the implications of process and technology change in quantitative terms.

Figure 1. CMM/CMMI Appraisal History
During 2001, our company hand-selected over two dozen individuals (many of whom were experienced, rank-and-file engineers), moved them to a dedicated, single-use location, and spent six months re-architecting our process set with a major focus on the newer CMM Integration (CMMI®) model’s practices. These practices combined Systems Engineering, Software Engineering, Project Management, Organizational Process Improvement, and many support functions. As it turned out, this was money well spent. Sponsorship from upper management never wavered. Lack of sponsorship is, all-too-often, the reason why process improvement efforts fail before they really get started!
Always keep in mind that the process improvement landscape is ever-changing; all failures must be viewed as additional opportunities for improvement and additional data for the Lessons Learned knowledge base.
First, there was the issue of defining our Process Improvement department’s organizational structure. Too little organizational definition can breed chaos whereas too much definition can lead to unneeded bureaucracy. Also, a process improvement organization can not stand separate from the other functional organizations like Engineering and Project Management. Instead, it must co-exist and, even more importantly, be tightly coupled with these other organizational departments. In addition, a good process improvement organization must be allowed to change, adapting itself to meet the ever-changing tactical and strategic objectives of the overall organization.
Case in point: When we began our process improvement efforts in the late 1980’s, we utilized CMM terminology to define our Software Engineering Process Group (SEPG). As we transitioned to CMMI, the SEPG was transformed into the Engineering Process Group (EPG) and other flavors of process groups also came into existence like the Project Management Process Group (PMPG). Since the individual process improvement efforts seemed to be implemented in the typical stove-pipe fashion, our Organization Process Group (OPG) was formed to help coordinate and track the various process improvement activities. Next, understanding that process improvement didn’t stop with Engineering and Project Management, we attempted to bring many supporting functions like Business Development, Business Operations, Finance and Human Resources into the fold under the OPG. This was an abysmal failure mostly due to the fact that these supporting functional organizations were operating several maturity levels below Engineering and Project Management. They were confused by our process improvement speak and overwhelmed by our higher maturity process definition/change processes. So, we reorganized and formed a sub-group (or sister-group) of the OPG called the OPG-Expansion (OPG-E) to facilitate the elevation of the supporting functions’ process maturity levels without slowing down the progress being realized by the OPG within Engineering and Project Management.
A key ingredient in our recipe for success was the implementation of a Process Improvement Support Group, including the Metrics Analysis Group (MAG), which was trained in CMM/CMMI concepts, measurement and analysis techniques (including statistical process control) and process change management. In particular, the MAG’s tasks were to support the definition of measurement and analysis models, develop tools (or modify COTS development products) that supported the measurement collection process and perform and/or support the analysis of data at both the project and organizational levels. We implemented many of the techniques described in J. McGarry, D. Card, et al., Practical Software Measurement, Addison Wesley, 2002, and D. Card, Defect Analysis, Advances in Computers, Elsevier, 2005.
Next, our process document set was revamped, breaking down the process documents into bite-sized morsels that a typical employee could easily swallow. While our previous document set was published in hardcopy, this new architecture was web-based. Also, our organization implemented an integrated approach to our processes, avoiding the usual stove-pipe approach of our previous process document set implementation. In addition, the description of each process had an identical look and feel (no matter which functional department it belonged to) and strict size limits were enforced for each type of document. In addition, the MAG holds annual brain-storming meetings with representatives from engineering, program management, and many of the other support functions to determine exactly what measurements should be deemed important enough to collect and/or analyze. At each meeting, many department representatives entered the room declaring that “We need to reduce the amount of data we were collecting!” Following each meeting, we had AS A GROUP defined 25% to 50% MORE “important” measures and indicators than we had at the start of the meeting! I will admit that we do occasionally “retire” a particular derived measure or indicator analysis model usually due to nonuse or misuse. We rarely retire any base measure. More likely, the more successfully implemented models, both the process models under instrumentation and measurement collection/analysis models, are continuously updated or improved driven by results from root cause analysis.
Also, our company hired consultants from the then Software Productivity Consortium (now renamed the Systems and Software Consortium, Inc.) as well as PhDs from academia to help re-develop our process set and re-train our employees. After spending millions on this process improvement effort, we decided not to skimp on the deployment aspect and received full support from our company’s upper management.
By the end of 2001, our new process set was deployed and new skills and cultural awareness training sessions were underway in full force. By May, 2002, and after less than one year of “institutionalization,” we were assessed at Software CMM Level-4 and at Level-5 by the end of that year. (Note that many of the practices we had put in place earlier, but that were not judged sufficient in 2000, continued to be performed so institutionalization was quite deep in some areas.) The very next year, since we had taken a CMMI rather than CMM based approach, we were able to hold a CMMI SCAMPI Class A appraisal and achieved Level-5 in 2003!
Measurements – Cornerstones for High Maturity Practices
This section describes the high maturity process deployed at BAE, specifically focusing on measurement practices. The staged representation of the CCMI associates just four process areas (PAs) with Maturity Levels 4 and 5. These are as follows:
• Quantitative Project Management (QPM)
• Organizational Process Performance (OPP)
• Organizational Innovation & Deployment (OID)
• Causal Analysis & Resolution (CAR)
Since only these four Process Areas (PAs) are required to achieve the Silver & Gold Medals of Process Improvement, CMMI Levels 4 and 5, why does it sometimes seem so difficult and take so long to get there?

Figure 2. Quantitative Management Process Roadmap
All of these process areas depend on Measurement and Analysis (M&A), a Level 2 process area. Failure to establish the rigor required by M&A at lower levels of maturity makes it difficult to transition to Level 4, which is all about measurement. Bad habits are hard to change.
Measurement & Analysis
To start with, you need to have a well-defined and well-implemented measurement collection and analysis process. Figure 2 depicts our current Quantitative Management (QM) Process Roadmap. This simple, 7-step process has realized fantastic results. At CMMI Levels 2 and 3, merely replace the term “QM” in step 1 with the term “Project Management Measurement” and it’s good to go.
Step 1 – First and foremost, you will need to have an understanding of where you currently are, where you think you’re going, and how you expect to get there. In other words, you need to have a plan; or, more likely, many, many plans. Strategic Process Improvement Plans, Organizational/Project Measurement or Quantitative Management Plans, etc. You don’t want to get to the end of the process and be able to tout that you had everything that you needed to accomplish the task at hand except a valid, documented, and agreed-upon plan of action. Process Action Teams (PATs) are utilized to help define, document, instrument and implement the initial development and/or measurement processes.
Step 2 – Now you’re ready to collect measurement data. But beware; there are many hidden perils and pitfalls when it comes to data collection. First, all data has a definite life span and some measurements spoil quicker than others. Measurement may directly influence behavior (through bias and avoidance), especially if the subjects are made painfully aware of the measurement process. Data collection practices should be as transparent to those under instrumentation and as automated as technologically possible. We spent several years designing, building, re-vamping and maintaining our suite of measurement collection tools. In some cases these data collection tools were home-grown, as with our Line-of-Code-Counter Measurement Tool. In other cases, we merely augmented COTS tools, like DOORS or Clear Quest, with additional internal attributes and data collection macros. Either way, much effort was expended during development at deployment of these tools.
Step 3 – Next, you’re ready to store the base measures and create the derived measurements and indicators according to your defined measurement and analysis models. These should have been previously identified and documented in your Measurement or Quantitative Management Plans.
Step 4 – The organization collects the raw data (base and derived measurements) on a regular basis for historic archiving and additional organizational level trend analysis. This data is also the source of the organizational process capability baselines that are necessary to differentiate a “good” measurement from a “bad” one. At lower maturity levels, this data will be used to determine threshold values and at higher maturity levels, this same data are used to define both control and specification limits for process and/or product indicators under more rigorous statistical process control.
Step 5 – Measurement analysis is performed at both the organizational and project levels; the level of rigor and depth of this analysis will vary according to the capability level of each process under review as well as the company’s overall level of process maturity. In Level-5 companies, dedicated Causal Analysis Teams (CATs) are proactive in determining the root causes of process-related issues at both the organizational and individual project levels.
Step 6 – The information and understanding gained from the analysis of measurements is used to drive the decision-making processes.
Step 7 – This is the Process Optimization step, a sort of feed-back loop in terms of process change management. Appropriate corrective actions are undertaken by PATs concerning process definition, maintenance, implementation, instrumentation and optimization for all involved processes, including the development, quality review and data measurement processes.
Collecting Improvement Information
Another important consideration for high maturity is the CMMI Generic Practice (GP) 3.2 - Collect Improvement Information. While this is a Level 3 generic practice, the improvement information collected here provides the fuel for CAR and OID. An organization that hasn’t done a good job of establishing mechanisms for collecting improvement information will be delayed on the road to Level 5.
CMMI General Practice 3.2 (a requirement for a Level-3 organization) involves the collection of historical information that might be useful in future Level-4 and Level-5 activities. This information can include just about anything, and therein lies the problem. Just what are you going to collect?
One commonly-employed method is to dust off the crystal ball, peer deeply into it and see if you can steer it 2 or 3 years into the future to determine what information might be important to you at that time. This method is probably employed about as often as is a dartboard when producing software size/effort estimates; and with equally-reliable results.
A slightly more mature method to employ would be to define a living matrix that maps your planned information needs against your business objectives at each CMMI Process Area’s capability level. Table 1 depicts a small portion of our company’s actual information needs which was documented in an information planning workbook that mapped our information collection strategy against our company’s strategic business objectives and standard business processes, as well as at which CMMI Process Area Maturity Level the information might become useful.
Satisfying this particular General Practice can require a bit of an investment. Information collection and maintenance costs time and effort. Most companies operating at lower levels of process maturity are too busy keeping indirect costs down to take the “long-term view.” Besides, even if you had piles of money to throw around, you usually don’t want to spend it collecting information that no one is currently planning to use or even look at. This mapping can help explain the cost and future benefits.

Table 1. Information Needs Planning Workbook
Typical Measures in an Engineering Environment
When most engineering measurement activities are initiated, one measure that offers many instances for measurement is the Product Quality Defect (i.e., a defect is a part of a work product that does not meet customer requirements). Since the typical Level-2 company has a relatively immature development process set, product defects are quite plentiful and as easy to catch as fish in a barrel.
The peril with focusing too heavily on this particular measure, to the exclusion of other possible quality measures, is that as time passes and you remove more and more fish from the barrel, it becomes increasingly more difficult to find and remove additional fish. Also, as defect prevention activities kick in, less and less fish will tend to be added to the barrel. Thus, this initially plentiful source of data will sooner or later be “fished out” by design.
Initial Level 2 and 3 quality measure models usually include simple progress tracks of defects discovered, defects resolved, average effort-to-resolve charts, and the like. Figure 3 depicts a typical defect tracking chart.
Additionally, defect categorization using Pareto charts can be very helpful in focusing limited budget at the more prominent issues at hand. Figure 4 depicts a typical project defect Pareto chart.
In addition to defects, other typical (and useful) engineering measures include:
• Requirements Volatility - % of Requirements Added, Modified, Deleted
• Design TBR/TBD Items Burn-Off Rates - TBDs Per Week
• Various Inspection Process & Product Indicators
– Product Size - Count of Lines Inspected
– Preparation Rate - Lines Per Preparation Hour
– Inspection Rate - Lines Per Inspection Hour
– Inspection Coverage - % of New/Modified Lines Inspected
– Defect Detection Density - Defects Per 1,000 Lines Inspected
• Software Unit Cost - Average Hours Expended Per Line of Code (Inverse of productivity)
• Testing Failure Intensity - Defects Detected Per X Hours of Testing

Figure 3 Typical Defect Tracking Chart
Each of the measures listed above, though, has its own special, hidden issues.
For instance, as soon as we started asking projects to track Requirements Volatility, the question arose, “How much is too much?” And I’m not sure that anyone truly knows the answer to that question, or ever will. Certainly, less than 100% volatility is preferable and, realistically, 0% is unattainable (and possibly even counterproductive when attempting to “firm up” the requirements) but who can say what a “good value” range might be. Also, soon after releasing our initial Requirements Stability measurement model, it was pointed out that while “unfunded volatility” was usually “bad,” “funded volatility,” no matter who pays for it, is usually a bonus from a project’s follow-on contract point of view. We were forced to re-think our measurement model, adding an additional measure, namely Funded Requirement Additions, to our Requirements Stability modeling.
In another example, the Software Productivity calculations like the Unit Cost and Productivity Index are both derived measures which are mathematical combinations of two base measures, namely Software Engineering Effort and Logical Lines of Code. However, these base measures can be difficult to accurately instrument, thus increasing the amount of “background noise” in the data. For instance, at our company, all effort data are electronically submitted and employee time charges should total at least 8 hours per day. When calculating something like software productivity, only the hours that are directly applicable to software engineering tasks should be used. However, software engineers are rarely supplied with charge numbers to cover many of the more mundane tasks like answering the phone/email or even filling in that pesky timecard. Typically, this all gets hidden in the financial bookkeeping adding noise to the “true” effort measurements.
Some measurement models are more sensitive to data noise than others. This is especially true of models built on non-linear relationships like the Testing Failure Intensity curve which exhibits exponential time decay. This model also makes many assumptions, including a stable code set with a fixed number of problems, a fixed starting time and level for the curve, and an accurately recorded amount of testing effort expended. Many of these assumptions are difficult to satisfy. Code cutoff rarely means code cutoff and late code added to the testing baseline after code cutoff is just as likely, if not more likely, to contain defects. Also, testers usually record 8 hours per day against testing regardless of how much time was actually spent testing verses time spent visiting the bathroom or water cooler, writing up problem reports, or talking about last night’s ballgame.

Figure 4. Project Defect Categorization Pareto Chart
Predictive Quality Modeling in an Engineering Environment
In companies with higher maturity measurement and analysis practices, including Quantitative Management (QM), most projects can benefit greatly from Predictive Quality Modeling. This concept truly closes the loop in the QM process by using previously collected and controlled QM-parametric data to calibrate the next run of a model. Our newly-implemented Inspection Planning Quality model uses over 5 years (more than 6,000 Inspections) of historic company Inspection data to help calibrate our previously-implemented Defect Profile Planning model. This new model produces predictions of defect injection and detection profiles as well as estimating the amount of cost avoidance in testing that can be expected based on historic performance, product size, and planned inspection coverages for the various development stages and work products. This new model also allows the quality planner to modify over 50 parameters to play the “What-If” game quickly and easily to determine how best to spend available Inspection budget to maximize inspection defect yield rates.
However, this type of analysis should not be attempted by the faint of heart, nor without the aide of a trained and experienced professional. A serious understanding of the project’s historical performance and the company’s current process performance capabilities are required. Even then, your initial attempts may produce unreliable predictions.
The basic concept is rather simple:
1) Utilize historic performance data to develop predictions of the defect injection/detection rates during the different development and testing phases of project
2) Use that information to predict (and possibly even affect) the final quality of the product under development.
Our company developed an Inspection Planning Template workbook using MS Excel that guides the project lead though our Predictive Quality Modeling process. At a minimum, the inspection planner needs to enter fourteen critical planning parameters, including the estimated sizes of the product under development and the planned Inspection/Testing Coverage parameters to get a rough prediction of the expected defect injection/detection profiles.
Forty of the 50 adjustment parameters are pre-loaded with company-averaged values. These values can be adjusted by the project to more closely represent their own process capability. The planner can either load in the project’s “planned Inspection budget” to receive a planned-to-predicted budget variance, or the planner can choose to have the workbook predict the required Inspection resources according to the current parameter settings.

Figure 5. Inspection Planning – Defect Injection/Detection Profile Predictions
After all the parameters have been entered on the Inspection Planning tab, the resulting quality analysis can be reviewed on the Quality Analysis tab. This analysis includes not only the predicted defect injection/detection profiles (see Figure 5), but also the predicted cost savings from the current planned Inspection parameters values as compared to running no product Inspections during development and, instead, planning to capture all defects during the Integration & Testing phases of the project.
Now that a predicted Defect Detection Profile is available, a project can use this data to run in-phase checks on the predicted verses actual defect detection rates. Figures 6 depicts our standard Defect Profiling Plan chart which allows the project to take interim, in-phase Actual-to-Date readings of defect counts and compare them against the Expected-to-Date counts. Analysis of these differences could be useful in determining what may need to be adjusted in either the current development phase or downstream to limit impacts to the final product quality.

Figure 6. Defect Profiling Plan – Predicted verses Actual Detection
In this example, roughly 61% of the code Inspections have been performed but yielded only 45% of the expected defect detections. What might have caused this 16% difference and how shall we adjust our future activities on this product development effort to compensate?
Analysis of the discrepancy between the predicted and actual counts of defects detected in the various development phases allows a project to identify possible quality issues early enough to affect the final outcome.
Should we plan to inspect more of the code product? Should we beef up our Integration Testing team and plan to catch the “missing” defects there? Or maybe the easier code was finished first and inspected early-on with the more complex code (which may be more likely to contain the “missing” defects) inspections yet to be held, so that this anomaly makes perfect sense and nothing needs to be done except attach a call-out bubble explaining the situation.
Due to the predictive nature of these Product Quality Defect models, the Inspection Plan and the Defect Profiling Plan allow the project lead additional planning of, control over, and insight into the product quality being built into the work product during even the earliest product development phases. Customers on some of our largest contracts (many of them former metrics skeptics, I might add) have found this type of measurement and prediction very interesting. True enough, you may not be very close on your first or even your second attempt. You will, however, learn more about the inner workings of your engineering development cycle, and then after recalibrating the 50 plus parametric values in the model once or twice or thrice, ultimately be able to predict fairly accurately your final product quality as early as the project proposal stage.
One interesting tie among the indicator charts seen in Figures 2 though 5 is that all these widely different views of the Quality Defect measure are created using the same base measurements (i.e., defect counts and associated categorizations). What’s different is the way the measures are organized, filtered, combined and displayed as well as the level of rigor of the associated measurement analysis model employed for each indicator.
Lower maturity measurement and analysis programs often collect too much data and use too little of it. Indicator charts are simple tracks against time with a threshold boundary or moving average thrown in for show. Many, if not most, indicators are post mortem in nature. They are useful as a tool to understand what went wrong after the fact, but not very useful to predict when something might go wrong.
Higher maturity measurement and analysis programs are more predictive in nature. They ask - what can we do to predict when a situation may go awry and do something now to change the outcome for the better? Measurements from one or more defined measures are collected, validated, analyzed for stability/controllability, and data distribution patterns to determine if a predictive behavioral model can be formed for the measurement(s). Control Charts, Histograms, Regression Fits and other statistical methods are employed to gain a deeper understanding of the inner workings of the process(es) under study. Much of this advanced process improvement activity is based on historical measurements that have been previously recorded for “possible future use.”
Measurements & Quantitative Analyses are truly cornerstones in the foundations of High Maturity Practices!
But does this mean that, when your organization is appraised at CMMI Level-5, you’ve reached the end of your process improvement journey?
The short answer is an emphatic “Hopefully not!”
Attaining CMMI Level-5 doesn’t guarantee that all process performance issues have been addressed. On the contrary, you probably have a better understanding than ever before of how much room you have in your organization for improvement. Attaining Level-5 means that it has been officially recognized that you have the processes, tools, skills, and other resources and infrastructure in place that are necessary to properly collect, analyze and address these opportunities for improvement.
Level-5 is not the end of the journey, but rather the end of the first leg and the beginning of the rest of the expedition.
Remember, your organization has probably invested millions or even tens of millions of dollars learning how to learn, change and improve. It has also hopefully built up a stable and proven process improvement infrastructure filled with process documentation, process group activities and process training materials. Now that you’ve got the folks in Engineering, Project Management and maybe even Configuration Management and Quality Assurance under process control, you can turn your process improvement catapults towards remaining organizational bastions of less-than-ideal processes like Business Development, Business Operations and (dare I say it) Travel and Labor Accounting.
About the Authors
Kevin Domzalski is a seasoned member of the process improvement group at BAE Systems National Security Solutions headquartered in San Diego, California, where he currently fulfills the role of organizational process optimization lead, which oversees CMMI Level 5 practices. He joined BAE Systems in 1983 and has served in several capacities in software and systems engineering. He also worked as an automotive industry consultant during a 5-year hiatus from BAE Systems. Domzalski also supports the metrics analysis group (MAG) activities part-time, performing metrics analyses on project and organizational measurements and metrics indicators. He led the MAG activities from 2002 through 2004.
David N. Card is a fellow of Q-Labs. Previous employers include the Software Productivity Consortium, Computer Sciences Corporation, Lockheed Martin, and Litton Bionetics. He spent one year as a Resident Affiliate at the Software Engineering Institute and seven years as a member of the NASA Software Engineering Laboratory research team. Mr. Card received the BS degree in Interdisciplinary Science from American University and performed two years of graduate study in Applied Statistics. Mr. Card is the author of Measuring Software Design Quality (Prentice Hall, 1990), co-author of Practical Software Measurement (Addison Wesley, 2002), and co-editor ISO/IEC Standard 15939: Software Measurement Process (International Organization for Standardization, 2002). Mr. Card also serves as Editor-in-Chief of the Journal of Systems and Software. He is a Senior Member of the American Society for Quality.
Author Contact Information
Kevin Donzalski
BAE Systems
10920 Technology Place
MZ 606-PI
San Diego, CA 92127-1874
Phone: 858-592-5294
Fax: 858-592-5260
[email protected]
David N. Card
Q-Labs
115 Windward Way
Indian Harbour Beach, FL 32937
Phone: 321-501-6791
Fax:
[email protected]
|