Trustworthy software always provides the same results to the same input. Trusted software must be achieved as people come to depend on software-based systems for their livelihoods and, as with emergency systems, their very lives. Software is fundamental to computerized systems, yet it is rarely taught as an entity whose trustworthiness can be improved with specific techniques.
A software engineering course is proposed to treat the issues of software trustworthiness. Feedback on the need and contents for such a course is sought. The Committee for National Software Studies sponsors a web page (www.cnsoftware.org) discussing trustworthy software and provides a means for commenting on the subject.
Software has a weak theoretical foundation, yet there exists a body of knowledge that is sometimes used to improve software trustworthiness. The reason for a system failure due to software is often due to something that could have been avoided with a different type of design. Unfortunately the State-of-the-Practice lags the State-of- the-Art. Pro. Shiu-Chin of Syracuse University writes that we should,
"develop...curricular support for...design methods...as a means to support system design. The level of professional practice will improve when we have practical high-assurance design methods which work-and when we train our students to use them."
Most current software theory focuses on its static behavior by analyzing source listings. There is little theory on its dynamic behavior and its performance under load. Often we do not know what load to expect. Dr. Vinton Cerf, inventor of the INTERNET, has remarked "applications have no idea of what they will need in network resources when they are installed." As a result, we try to avoid serious software problems by over engineering and over-testing. The Federal Food and Drug Administration of the USA notes:
"Software verification includes both static (paper review) and dynamic techniques. Dynamic analysis (i.e., testing) is concerned with demonstrating the software's run-time behavior in response to selected inputs and conditions. Due to the complexity of software, dynamic analysis alone may be insufficient to show that the software is correct, fully functional and free of avoidable defects. Therefore, static approaches and methods are used to offset this crucial limitation of dynamic analysis. Dynamic analysis is a necessary part of software verification, but static evaluation techniques such as inspections, analyses, walkthroughs, design reviews, etc., may be more effective in finding, correcting and preventing problems at an earlier stage of the development process."
Software engineers cannot ensure that a small change in software will be limited to a small change in system performance. Industry practice is to test and retest every time any change is made in the hope of catching the unforeseen consequences of the tinkering. The April 25, 1994 issue of Forbes Magazine pointed out that a three-line change to a 2-million line program caused multiple failures due to a single fault. There is a lesson here. It is software failures, not faults that must be measured. Design constraints that can help software stability need to be codified before we can hope to deliver reliable performance. Instabilities arise in the following circumstances:
Computations cannot be completed before new data arrive,
Rounding-off errors build or buffer usage increases to eventually dominate system performance,
An algorithm embodied in the software is inherently flawed.
Trustworthy Software is now emerging as a technology area of interest. The IEEE is considering forming a special interest group, the Center for National Software Studies is sponsoring a web page devoted to exploring trustworthy software and the National Institute of Standards and Technology (NIST) gathers data on software defects and explores trustworthy issues.
A course dealing with these issues is being developed. It will feature design constraints that make software trustworthier. The topics planned include:
Case Histories of Failures.
Requirements Validation
Stability Analysis
Software Connectors
Ethics
Reliability Models
Failures, Faults and Defects
Testing
Software Visualization
Metrics
The first constraint limits the state space in the execution domain. Today's software runs non-periodically, which allows internal states to grow without bound. Software Rejuvenation is a new concept that seeks to contain the execution domain by making it periodic. An application is gracefully terminated and immediately restarted at a known, clean, internal state. Failure is anticipated and avoided. Non-stationary, random processes are transformed into stationary ones. One way to describe this is rather than running a system for one year with all the mysteries that untried time expanses can harbor, run it only one day, 364 times. The software states would be re-initialized each day, process by process, while the system continued to operate. Increasing the rejuvenation period reduces the cost of downtime but increases overhead. One system collecting on-line billing data operated for two years with no outages on a rejuvenation interval of one week.
A Bell Laboratories experiment showed the benefits of rejuvenation. A 16,000 line C program with notoriously leaky memory failed after 52 iterations. Seven lines of rejuvenation code with the period set at 15 iterations were added and the program ran flawlessly. Rejuvenation does not remove bugs; it merely avoids them with incredibly good effect.
If we cannot avoid a failure, then we must constrain the software design so that the system can recover in an orderly way. Each software process or object class should provide special code that recovers when triggered. A software fault tolerant library with a watchdog daemon can be built into the system. When the watchdog detects a problem, it launches the recovery code peculiar to the application software. In call processing systems this usually means dropping the call but not crashing the system. In administrative applications where keeping the database is key, the recovery system may recover a transaction from a backup data file or log the event and rebuild the database from the last checkpoint. Designers are constrained to explicitly define the recovery method for each process and object class using a standard library.
George Yamamura of Boeing's Space and Defense Systems reports that defects are highly correlated with personnel practices. Groups with 10 or more tasks and people with 3 or more independent activities tended to introduce more defects into the final product than those who are more focused. He points out that large changes were more error-prone than small ones, with changes of 100 words of memory or more being considered large. This may have some relationship to the average size of human working memory. The high .918 correlation between defects and personnel turnover rates is telling. When Boeing improved their work environment and development process, they saw 83 percent fewer defects, gained a factor of 2.4 in productivity, improved customer satisfaction and improved employee moral. Yamamura reported an unheard of 8 percent return rate when group members moved to other projects within Boeing.
Most communications software is developed in the C or C++ programming languages. Les Hatton's book, Safer C: Developing Software for High-Integrity and Safety-critical Systems (ISBN: 0-07-707640-0), describes the best way to use C and C++ in mission-critical applications. Hatton advocates constraining the use of the language features to achieve reliable software performance and then goes on to specify instruction by instruction how to do it. He says, "The use of C in safety-related or high integrity systems is not recommended without severe and automatically enforceable constraints. However, if these are present using the formidable tool support (including the extensive C library), the best available evidence suggests that it is then possible to write software of at least as high intrinsic quality and consistency as with other commonly used languages." For example, a detailed analysis of source code from 54 projects showed that once in every 29 lines of code, functions are not declared before they are used.
C is an intermediate language, between high level and machine level. There are dangers when the programmer can drop down to the machine architecture, but with reasonable constraints and limitations on the use of register instructions to those very few key cases dictated by the need to achieve performance goals, C can be used to good effect. The alternative of using a high level language that isolates the programmer from the machine often leads to a mix of assembly language and high level language code which brings with it all the headaches of managing configuration control and integrating modules from different code generators. The power of C can be harnessed to assure that source code is well structured. One important constraint is to use function prototypes or special object classes for interfaces.
The optimum module size for the fewest defects is between 300 to 500 instructions. Smaller modules lead to too many interfaces and larger ones are too big for the designer to handle. Structural problems creep into large modules.
All memory should be explicitly initialized before it is used. Memory leak detection tools should be used to make sure that a software process does not grab all available memory for itself, leaving none for other processes. This creates gridlock as the system hangs in a wait state because it cannot process any new data.
A study of 3,000 reused modules showed that changes of as little as 10% led to substantial rework - as much as 60% - in the reused module. It is difficult for anyone unfamiliar with a module to alter it and this often leads to redoing the software rather than reusing it. For that reason, it is best to reuse tested, error-free modules as is.
A course to codify and teach trustworthy software is proposed. The main idea is to design to avoid crashes and hangs so that software based systems become trusted systems. Trusted systems are those that repeatedly and reliably provide the same output for the same input when the environment is the same.
Suggestions and comments on course contents, approach and importance are sought.
About the Author |
Author Contact Information |
Mr. Bernstein is a recognized expert in Software Technology, project management, network management and technology conversion. He is president of the National Software Council with the goal of improving American software competitiveness, making software trustworthy and getting the software industry, the government and academia to work better together. He is now doing consulting through his firm Have Laptop - Will Travel and is the Executive Technologist with Network Programs, Inc. building software systems for managing telephone services. Mr. Bernstein was an Executive Director of AT&T Bell Laboratories where he worked for 35 years. As a software project manager he successfully built, sold and deployed a software system that automated the 100 million paper records telephone companies used to keep track of telephone lines to residential homes. As a technologist he invented the concepts of 'dynamic provisioning' and 'routing to intelligence' which are included in the seven patents he holds. He saw the opportunities contained in research into software fault tolerance and championed its commercialization. It is now used in 24 products deployed in over 500 sites. As a contributor to the profession he was recognized as a Fellow of the IEEE, the ACM and Ball State University. He is a member of the Russian based International Information Academy. He is a visiting associate of the Center for Software Engineering at the University of Southern California |
Larry Bernstein ARTICLE REFERENCES |
![]() |
![]() |