Software Tech News 3-2: Software Testing - Improving Information Quality for the Warfighter through Self-Checking Systems

through Self-Checking Systems References

Improving Information Quality for the Warfighter through Self-Checking Systems

Tod Reinhart, Air Force Research Laboratory, Dr. D. Joel Mellema, Raytheon Systems Company, and Carolyn Boettcher, Raytheon Systems Company

Introduction

Testing of mission-critical systems to a high degree of reliability has been a long time problem for the military services. As a result, system failures may occur in the field due to faults that result from unusual environmental conditions or unexpected sequences of events that were never encountered in the laboratory. The types of systems we are interested in are those that must operate within the constraint of real-time deadlines and produce inexact outputs based on computations from a succession of heuristic and approximate algorithms. Such systems, which include complex software and hardware interactions, are particularly difficult to validate and test. To improve the validation and test process and deliver more reliable systems, we have been experimenting with self-checking systems that continuously monitor themselves to detect suspicious events, which may indicate residual errors at a deep performance level.

Under the Air Force Self Checking Embedded Information System Software (SCEISS) program, theoretical university results are being extended to new classes of problems while applying these new types of result checkers to real-time embedded applications. The preliminary results reported here include a description of the example applications and their checkers, the process used to select checkers, and some initial data points on any additional software costs that may accrue due to the development and test of checkers.

Problem Statement

Like the military services, Raytheon has a long-term interest in and commitment to solving the problem of residual errors that are not detected and corrected before a system is deployed. Such errors are like a time bomb, waiting for just the right combination of circumstances to cause significant performance degradation or even catastrophic system failure. To find solutions to this problem, several years ago Raytheon surveyed problem reports of errors that were not detected in a production radar system until after radar subsystem integration. It was found that many of the errors resulted from an unusual combination of circumstances that were unlikely to be encountered in the integration laboratory and that might not even be encountered during operational testing, e.g., flight test.

Testing is the most widely used method of validating that systems are performing as requirements dictate. However, it has been shown that it is not feasible to use testing alone to validate large systems to a high degree of reliability¹. As an alternative, correctness-proving techniques are sometimes used to verify system correctness. However, because of the difficulty of these techniques, it is not practical to apply them to anything but small, well contained portions of larger systems, such as a trusted kernel.

Another approach that is often used to ensure reliable performance is functional redundancy. For example, three versions of software may be independently implemented to perform a given function. The results of each version are compared, with a two out of three majority declared to be correct. However, there are a number of practical problems with functional redundancy. Often the implementations whose results are compared are not statistically independent, because even though system designers and implementers are working completely independently, they often make correlated errors. In addition, the amount of run-time resources required to execute the function is increased by a factor of three. In systems where run-time resources are tight, this multiplicative increase may not be acceptable.

How then can residual software errors, which significantly increase the DoD's cost of ownership and may contribute to mission failures, be significantly reduced or effectively eliminated? We concluded that an entirely new paradigm was needed to ensure that large, complex systems with limited run-time resources can be maintained in a cost effective manner so that they continue to operate correctly.

Self-Checking Enbedded Information System Technology

The Self Checking Enbedded Information System Software (SCEISS) program is applying Checker technology to a range of embedded information system applications with the expectation that this technology will result in a dramatic reduction in fielded software errors and maintenance costs. Checker technology is based on checker software that executes at critical points to check the correctness of intermediate system results. Checkers become a permanent part of the software - they are retained throughout the system operational life cycle. Checkers keep checking and checking and checking even after the system is deployed. Because they are always checking, checkers will eventually encounter and detect those errors and unexpected conditions that cause degraded system performance whenever and wherever those conditions occur. Checkers can be considered to be analogous to hardware built-in-test that executes periodically throughout the mission to decide whether the hardware is functioning correctly. Like hardware built-in-test, checkers report any anomalies detected during the mission so that errors can be fixed.

Definition of a Simple Checker

A simple checker is special software that is embedded in code to continually check results over a large number of executions. It must have a good probability of eventually detecting any errors, especially after many executions, while maintaining a very low probability of false alarm (i.e., a simple checker must rarely or never declare a correct result to be erroneous). By definition, a simple checker must be much simpler and more reliable than the algorithm being checked. As a result of this definition, the execution and memory overhead added by a checker is small. In addition, the fact that the checker is simpler than the original algorithm helps to ensure its statistical independence from the algorithm being checked. Although there are some similarities between checkers and traditional fault tolerance techniques based on redundancy, checkers are different from traditional software fault tolerance techniques in two important ways: they are statistically independent from the original algorithm; and they do not double or triple the cost and runtime overhead of a function being checked.

Prior Checker Research

Universities, Raytheon, and the Air Force have sponsored prior checker research. The seminal research in checkers has been ongoing for more than 10 years, led by Dr. Manuel Blum at the University of California at Berkeley, who defined the results checking paradigm that is the foundation for the SCEISS effort². After determining that theoretical checker results might be applicable to real-life embedded applications, Raytheon and the University of California jointly funded research to extend checkers to some realistic avionics applications. In particular, this effort resulted in the definition of a general method for checking the results of a Fourier transform implemented in limited precision, fixed point arithmetic, as reported in [3] and [4]. With Dr. Blum consulting, Raytheon began a small checker pilot program as part of a production radar upgrade program, described in [5]. We determined that two of the three checkers developed under the pilot program, but not deployed, would have detected errors that were uncovered later during flight test.

What's Good About Results Checking?

Many undetected software errors involve rare combinations of circumstances. Because of their rarity, a very large number of independent tests would be needed to create these special circumstances, if they can be reproduced at all in the laboratory. Because they are embedded in the operational software, checkers can detect errors during all phases of testing and operational use. In fact, checkers can execute as often as the software executes. For example, a computation occurring every 10 milliseconds will be checked a million times every three hours. In addition, checkers can detect erroneous results that do not produce obvious symptoms at the system level, and thus, might easily be overlooked by testers. As a result, errors can be corrected even before they cause externally observable system degradation.

Checkers are independent of the system design methodology, the software implementation language, and the embedded processor on which the software executes. Instead, checkers are based on a priori mathematical principles or physical laws governing the computations being checked. As a result, checkers can be applied to legacy systems that are being upgraded almost as easily as they can be applied to new systems. Although particular checkers may be application domain dependent, the self-checking approach can be generally applied in any application domain.1

Checking for the Pentium Division Bug

The division bug in Intel's original release of the Pentium processor provides an excellent example of the potential value of checkers and illustrates that checkers can be useful for detecting hardware bugs as well as software bugs. The Pentium division bug evidenced itself very rarely, less than 1 in every 8 billion inputs. However, even though the bug rarely caused a problem, users of the Pentium processor soon discovered it. As a result, Intel was forced to recall the "buggy" processors and correct the error.

When news of the Pentium division bug was published, Blum and Wasserman invented a software checker/corrector that provides an "a priori" solution to the Pentium bug⁶. Their checker not only detects the erroneous division result, but also corrects the result. The Pentium division checker/corrector does not even need to know the type of error or even that an error is present in order to detect and correct it.

Applying Checkers to a Production Radar Upgrade Program

Begun early in 1998, SCEISS is demonstrating the efficacy of checkers when used as an integral part of the system development process to improve the quality of delivered systems⁷. SCEISS will also help to transition checkers from theory to practice. In 1998, SCEISS applied self-checking technology to a production radar upgrade program that was adding a range gated high (RGH) pulse repetition frequency (PRF) mode to the existing software. The mode code is reused from another radar program with substantial modifications needed to adapt it to a different platform. The SCEISS team analyzed the software requirements for the modifications and identified candidate functions for checking.

Checking the CFAR Loop

From the candidate functions identified during the requirements analysis, the constant false alarm rate (CFAR) control loop was selected for checking. There were several reasons why the CFAR loop was considered a good candidate.

The CFAR control loop has a direct impact on detected targets and false alarms that are critical performance measurements.
Domain experts agreed that the CFAR loop often caused problems during radar system integration and flight test.
The requirements were quite complicated, making it more likely that coding errors would occur in implementing them.
The CFAR loop is a specific example of a general type of control loop calculation. Hence, a checker invented here might be applicable across a wide class of control problems.

CFAR Checker Design

The CFAR threshold calculation is an example of an algorithm where there are no simple rules for determining if a result is right or wrong. As a result, our CFAR checker looks for "suspicious" results that probably indicate significant system performance degradation, rather than definitely "wrong" results.

In general control theory, the control loop seeks a stable threshold value that keeps the system operating in an optimal manner, even while the environment, as measured by the sensor being controlled, is changing. To do this, the threshold must be continually adapted to changing environmental conditions based on feedback from the sensor. The requirements in this case specify minimum and maximum values for the CFAR threshold. If the calculated value is outside of the specified range of 3.0 to 5.8, it is reset to the maximum or minimum value as appropriate.

To help quantify "suspicious" values, we first simulated the CFAR threshold calculation 100,000 times using randomly generated input data representative of that seen in a real system. Based on statistics of the expected distribution of threshold values, we predicted that the threshold value would be in the range of 3.5 to 3.8 for approximately 98% of the time. Therefore, values outside of the range 3.5 to 3.8 might be considered suspicious. We subsequently decided to set the checker to fire more conservatively at values outside of 90% of the legal range, or 3.28 - 5.52, respectively. In addition, the checker fired only when the calculated threshold exceeded the high or low value over 90% of the time after 50 values were collected at half-second intervals.

Metrics Collected

The radar development team and SCEISS team worked together to implement the simple checker described above. In coding the checker, we added 15 Jovial source lines of code (SLOC) to the software module that calculated the CFAR loop threshold, which was originally about 100 SLOC. In addition, 19 SLOC were added to the Jovial Compool containing the global data definitions. The percentage increase in the SLOC gives a rough estimate of the percentage additional effort that would be required to implement checkers, assuming that the same productivity rate is used to predict the cost of developing checkers as is used to predict the cost of developing the application.

The performance overhead was estimated by comparing the operations performed by the checker compared to the operations performed by the algorithm. Performance overhead for the CFAR loop checker was estimated to be less than 5%. Checking was performed in the system integration laboratory in parallel with the regular system integration effort, so the program's flight test schedule would not be perturbed. The checker was run on four separate occasions in the integration laboratory against three versions of the software. On the first three occasions, the threshold values were consistently low causing the checker to fire under several different test conditions.

The software development team responsible for the production tape upgrade was informed about the problems with the CFAR loop uncovered by the checker. Subsequently, the application code was corrected and the checker was run for the fourth and final time in the integration laboratory after the system had been in flight test for some time. This final checker demonstration showed that the CFAR loop performance had been considerably improved.

We believe that this experiment illustrates the value of even very simple checkers. We found that even with a relatively insensitive checker, we were able to detect suspicious events occurring in the system which otherwise might not have been detected. For a more detailed description of the CFAR loop checker, the reader is referred to the SCEISS Interim Report for 1998⁸.

Applying Checkers to a Technology Program: Ongoing Proof of Concept Demonstration

In 1999, as our second proof-of-concept demonstration of checkers under SCEISS, we chose the Theatre Missile Defense Smart Sensor and Automatic Target Recognition (TESSA) program. TESSA is in the fourth phase of a technology demonstration using the F-15E aircraft and radar and FLIR sensor suite. During the TESSA IV flight test, fusion of SAR and IR sensor data for Automatic Target Cueing and Recognition in attack operations will be evaluated. In the demonstration, the SAR will be used to cue the IR sensor to interesting objects. The flight test will include modified APG-70 radar modes and a modified LANTIRN targeting pod, as well as new sensor fusion code.

The objective of the TESSA program is to enhance the F-15E's capability to locate, identify, and destroy stationary and moving threat TMD assets. To accomplish this, the radar improvements include enhanced SAR resolution (4' x 6' maps), detecting probable ground targets, and overlaying the most likely targets on the SAR map display. A new Fused Feature Automatic Target Recognition (FFATR) processor will be installed on the aircraft to host the sensor fusion software. A new interface will be provided between the radar and the FFATR in order for the radar to cue the FLIR. In addition, the code in the F-15E Central Computer will be modified to support the additional TESSA functionality.

The TESSA radar processing detects an area of interest (i.e., an object that is likely to be man-made) in a radar map. In addition, the radar estimates the size, shape, and orientation of the object. The area of interest is sent to the fusion engine to cue the FLIR. This description suggests several points where checkers might be used. For example, checking the radar data used to cue the FLIR ensures that the radar is improving, rather than degrading the FLIR performance.

Three to six checkers are being added to the automatic target cueing (ATC) mode of the APG-70 radar software. Although the radar ATC algorithms were previously validated in a laboratory demonstration using radar maps and FLIR images recorded during flight test, the implementation of those algorithms in the APG-70 operational software is all new code. This experiment is currently ongoing and results will be published in the SCEISS interim report for 1999.

Summary and Conclusions

Under the SCEISS program, we are making progress towards demonstrating the advantages of self-checking systems and transitioning self-checking system technology from theory to practice. We took advantage of a production radar tape upgrade that was starting software testing and proceeding into flight test during the first year of the SCEISS program. We implemented a checker for deep system performance to detect suspicious behavior that is often overlooked during traditional testing, but which indicates that significant performance degradation could occur after the system is deployed. As a result, suspicious behavior was observed and reported back to the regular program development team for further analysis and correction of the problem. From this experiment, we collected evidence that effective checkers can be implemented at a relatively small additional cost with minor runtime overhead to the software being checked.

We also identified candidate checkers for the TESSA IV technology demonstration program that have the potential for reducing risk as the program proceeds into laboratory and flight test in 1999. We are in the process of implementing those checkers and exercising them in the integration laboratory and in flight test. The metrics collected during the TESSA IV demonstration will provide further evidence of the cost effectiveness of using self-checking techniques to improve the quality and reliability of embedded information systems.

In out years of the SCEISS program, we are planning further demonstration programs using checker technology. We believe that checkers will prove especially valuable in space systems where reliability is especially important.

Author Contact Information
D. Joel Mellema, and Carolyn Boettcher Raytheon Systems Company www.raytheon.com/rsc/	Tod J. Reinhart AFRL/IFTA Embedded Information Systems 2241 Avionics Circle, Suite 32 WPAFB OH 45433-7334 [email protected] (937) 255-6548 ext. 3582 DSN 785-6548 x3582 Fax: (937) 656-4277, DSN-Fax 656-4277

Previous Table of Contents Next