Building Systems Using Software Components

By Robin Ying, Stevens Institute of Technology

Introduction

It is human nature that when facing a complex problem, we tend to break it into pieces and solve the smaller problems. This divide-and-conquer strategy works for many different situations, all the way from Napoleon's military movements to building software systems. When constructing software, the key issue is designing the software architecture so that the entire system is properly decomposed into parts or components. The parts or components can be anything from subroutines or code modules in programs to complete sub-systems that form separate executables.

The famous computer scientist, Edsger W. Dijkstra, once raised an interesting argument: Suppose a program is composed of N parts, and each part has a probability of correctness of p_i. Then as the parts are assembled together, the correctness of the whole system will be

P = Π^N_i=1 p_i

If N is large, then unless each p_i is very close to one, P will be close to zero. At first glance, it may seem that Dijkstra’s argument is against decomposition, since as N increases, P decreases. However, as we examine the equation more closely, we realize that Dijkstra was mathematically proving what we already know – the larger a program gets, the more the correctness of each component matters.

In practice, as the complexity of a system increases, there is an increasingly smaller chance we can achieve correctness without flaws. However, by breaking the problem into smaller pieces, we allow ourselves to deal with a less complex situation, and thus the chance of achieving overall system correctness increases. Ideally, if each component works flawlessly (i.e. p_i = 1), then P = 1 regardless of the value of N. Wouldn’t this be wonderful? In theory, all we have to do is to break down the program into many tiny parts, each one contains no more than a few lines of code. We would then ensure individual component correctness, and thus overall system correctness.

However, as we know again, this does not work in practice. Even with each individual component having a correctness of 1, when the pieces are assembled together, we introduce integration errors. As the number of components in a system increase, the number of integration errors increase as well.

Thus, the actual process of software system construction is a compromise between these two extremes. We decompose the system into “manageable” components, and try to minimize the integration error as we assemble them together. This requires that the software architecture be carefully designed to achieve this delicate balance.

Modern software architecture practices promote “reuse”: that is, to build reusable components that can be utilized in the project multiple times, or by other systems. This produce-once-use-many-times strategy is a great way to increase efficiency and reduce costs. It also complements the component-based architecture. Very often, the reused components are treated as “building blocks”. The current trend of component-based architecture is to incorporate as many COTS (commercial off-the shelf) components as possible to reduce the development time and costs. However, choosing the proper COTS components and carrying out the integration carefully are challenges in the new dimension.

What’s the Problem?

As previously mentioned, integrating components generates errors. Let’s assume a software system has two components: A and B. Integrating A and B produces the whole system S.

In component A, the units used in computation are English units (inches, miles, ounces, pounds, etc.) and in component B, the units are in metric (centimeter, kilometer, grams, kilograms, etc.). To date, there is no programming language that can automatically distinguish the floating point number “1.25”, a basic data type, to mean 1.25 inches or 1.25 centimeters. A careless integration of A and B can produce devastating errors, illustrated in the NASA Mars Climate Orbiter disaster on September 23, 1999. In the NASA incident, an error in the altitude calculation caused the multi-million dollar satellite to be destroyed by the planet’s atmosphere. The error report revealed that unit mismatch as one of the major causes of the mission’s failure.

Careless reuse can be another source of disaster. The Therac-25 radiation therapy machine accidents are an example of this type. From June 1985 to January 1987, six deadly accidents occurred due to machine malfunction, causing the patients to be exposed to radiation overdoses. In this case, the Therac-25 manufacturer reused the control software written for earlier models, which used electro-mechanical interlocks to prevent radiation overdose. In the Therac-25, the electro-mechanical interlocks were removed and no proper updates on the reused software were made to provide the needed safety check.

In today’s object-oriented software design approach, we initially develop a structure where multiple parts are at a roughly equal level of abstraction. Then we analyze and refine each part so that they will fit together and achieve the desired features and functions before starting the implementation.

Experience tells that it is often better to organize a system around data than around functions. The key consideration in decomposing a system is how we couple the components. We need to minimize coupling between the components and maximize the cohesion within each component so that each component can be built independently. In addition, this approach also provides the better reusability of a component, since the component may be used for systems completely unrelated to the original.

Decades ago, the Shell scripts (an interpreted programming language in the UNIX system) together with the UNIX commands and environment achieved a set of desired characteristics for component-based software programming. Today, the Java system (programming language and the Java VM) and C#/.Net from Microsoft are taking this concept further, by providing more building-block components, stronger type checking, and more built-in safety checks (e.g. array out of bound check). This higher level of abstraction incorporates the component-based design concept into the programming languages. It “componentizes” system-level utilities and enhances their reusability. This not only makes the lives of software developers easier, but also greatly increases the quality of the resulting system.

Components and Trustworthiness

By definition, a software system is said to be trustworthy if the system is safe, reliable and secure. Therefore trustworthiness of a software system includes three quality attributes: reliability, security and safety. These attributes not only provide for system stability, but also for ethical responsibility when dealing with systems that may affect people’s lives. Let’s discuss how the components in a software system affect these quality attributes.

Reliability

Reliability is one of the most important non-functional requirements for all systems. The reliability of a software system depends on many factors, such as the computer hardware, the operating system, the physical environment, but most importantly, the software system itself. The reliability of a software system should be judged holistically, i.e. after all its components are integrated together, not by looking at each component individually.

Let’s look at a simple example. System S₁ consists of components A and C, and system S₂ consists of components B and C, where component C performs arithmetic calculation on data passed from component A in S₁ and B in S₂. C does not perform the divide-by-zero check and this fact is not revealed in its specification. Component A performs its own divide-by-zero check before passing data to C, but B does not. Assuming no other complicating factors, then we can say that S₁ is reliable and S₂ is not. In addition, component C is not reliable since it does not reveal the fact that it does not provide a divide-by-zero check in its specification.

In addition, the Mars Climate Orbiter disaster shows that a single integration error can make the whole system unreliable even if all its components are fully reliable. Thus, there is no common rule that relates the overall software system reliability to the reliability of its components. Nevertheless, we always want to start with a set of reliable and reusable components; as said in a Chinese proverb: a good start is half the success.

The reliability of a software system can be improved by rejuvenation. Software rejuvenation is a preemptive, periodic restart of a running system at a clean internal state to prevent future failures. When a system is composed of multiple components, do we need to rejuvenate all of them? The answer lies in the question, “How close are these components coupled together?” Consider a web-based application that has an application server and a database server. The application server may need the rejuvenation treatment while the database server may not, or both may need rejuvenations, but at different frequencies. This also shows that if we design a system to compose of multiple loosely coupled components, where each component can be rejuvenated independently, we have a better chance at improving its reliability. However, rejuvenation incurs overhead, such as server downtime or inaccessibility of data. Oftentimes, statistical modeling methods are used to determine the optimal frequency for rejuvenation.

Security

Achieving the desired security needs cooperation of the software system, the operating system, and the running environment. In addition, one must consider both software security as well as physical security. Operating systems without kernel memory protection can never support any secure application properly. Top-level security requirements from the government demand physical air gaps separating the computer networks and vault-like rooms to secure computer equipment.

The security of a software system is usually handled by a set of its components. These components, whether they are from COTS or custom developed, handle the access authentication, intruder detection and prevention functions, and other needed features. These components usually have high degree of reusability. Failure at the component level will often compromise the security of the entire system. Thus, the design and construction of these security components have evolved into a specialty. Today, many well-designed COTS components with detail specifications are available for use. However, proper integration is still the key to ensure that these components are being used in the manner they were designed.

Safety

The safety of a software system includes its interaction with the environment, as well as other affected groups such as people or animals. An unsafe system means that it may cause harm to those involved and raise concerns of varying degrees. Furthermore, if the software system is used to control physical devices, such as machines, the safety of the system is of utmost importance.

Safety requirements are non-functional requirements; however, its importance often supercedes functional requirements. In many cases, functional requirements need to be compromised or re-evaluated in order to bring the overall system within the safety margin.

Although safety related features might be handled by a set of components, similar to reliability and security, the safety of the entire system is an integration issue. This even extends to the operators – those who use the system. If the usage of the system is out of the design intention or limitation, severe hazardous conditions may occur and render the system to be unsafe.

The following table exhibits the software control categories as defined in the Military Standard System Safety Program Requirements, MIL-STD-882C.

Category	Description
Autonomous	Software exercises autonomous control over potentially hazardous hardware systems, subsystems or components without the possibility of intervention to preclude the occurrence of a hazard. Inappropriate software action or failure to act can contribute directly to a Top Level Mishap (TLM).
Semi-Autonomous	Software displays safety-related information or exercises control over potentially hazardous hardware systems, subsystems, or components with the possibility of intervention by independent safety systems to preclude the occurrence of a TLM. However, the possibility of intervention by itself is not considered sufficient to prevent a mishap.
Semi-Autonomous with Redundant Backup	Software displays safety-related information or exercises control over potentially hazardous hardware systems, subsystems, or components. However, there are two or more independent safety measures within the system, but external to the software item, which mitigates the possibility of leading to a TLM.
Influential	Software processes safety-related information but does not directly control potentially hazardous hardware systems, subsystems, or components.
No Safety Involvement	Software does not process safety-related data or exerciser control over potentially hazardous hardware systems, subsystems or components. The software cannot contribute to a mishap.

In the Therac-25 case mentioned earlier, based on the software control categories, the system control software was elevated from the “no safety involvement” category (because electro-mechanical interlocks were used) in the earlier models, to the “autonomous” category without proper updates. It was an invitation for disaster.

Conclusions

From the discussion above, we concluded that for the trustworthiness of a software system, although good components can make key contributions, the problem is still at the holistic level. It is a complex problem, and a problem that the divide-and-conquer strategy cannot be applied to get the desired result. Instead, developing trustworthy software systems requires great ingenuity, proper architecture and design, discipline, experience, management commitment and understanding, and the continuous undivided attention to quality. This is very similar to automobile products; it takes time and attention to establish the reputation of the products, the team, and establish confidence in the user community.

References

Bernstein, L and Yuhas, C.M., Trustworthy Systems Through Quantitative Software Engineering, Wiley, 2005.

Bernstein, L and Kintala, C., Software Rejuvenation and Self-healing, January 14, 2004.

Dahl, Dijkstra and Hoare, Structured Programming, Academic Press, 1972.

E.W. Dijkstra Archive, http://www.cs.utexas.edu/users/EWD/

Hans van Vliet: Software Engineering – Principles and Practice, Addison-Wesley, 2000.

Military Standard System Safety Program Requirements, MIL-STD-882C, January 19, 1993.

About the Author

Dr. Ying began his software work at Bell Laboratories. He was among the first few to start telecommunication business application software development using C language and UNIX system in the early 80s. Since then he has been an active practitioner in all aspects of the software development process. He received his PH.D in Electrical Engineering and Computer Sciences from University of California at Berkeley. He is currently the Dean of Technology and Learning Services at the Imperial Valley College, and an adjunct faculty member of the WebCampus, Stevens Institute of Technology.

Email at Imperial Valley: [email protected]

Email at Stevens Institute: [email protected]

Detailed Bio:

http://webcampus.stevens.edu/info/faculty3.html#ying

March 2006
Vol. 9, Number 1

Measurement

Articles in this issue:

Tech Views

The Measurement Challenge of High Maturity

Estimating System-of-Systems Development Effort

Measurements: Managing a Large Complex Modernization Effort

Building Systems Using Software Components

Download this issue (PDF)

Receive the Software Tech News