By Ellen Walker, DACS Analyst

Just what is "Software Archaeology" anyway?

Is it just an analogy? A useful metaphor? A software engineering process? A software engineering discipline? A technology? Or perhaps, a combination of these things? It appears that the software community has not yet converged on a singular meaning for the term and its scope as evidenced by the statements of our current authors and others cited in my references.

Hunt and Thomas [1] claim that archaeology offers a pretty good analogy for the activities that one performs when tasked with fixing or revising legacy code and, in general, addressing the common problems of trying to understand someone else's code. The major difference between software archaeology and real archaeology is that the objects of our efforts do not have to be a thousand years old. They use the terminology of real archaeology to describe useful techniques for working with someone else's software.

An advertisement appearing on the House of C website [2] is titled "Software Archaeology" and contains the following text, "We specialize in making things work, particularly integrating old data with new interfaces. It doesn't matter how old, creaky, and poorly documented the product or system you're fighting with - we can figure it out."

Booch [3] defines software archaeology as "the recovery of essential details about an existing system sufficient to reason about, fix, adapt, modify, harvest, and use that system itself or its parts."

The authors featured in this issue of Software Tech News offer additional, yet differing, perspectives on the subject. In the first article titled "Reverse Engineering and Software Archaeology", Ralph Johnson (co-author of the seminal book, Design Patterns:Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995) asserts that "software archaeology is a point of view, not a set of technologies" and that, although it uses the same tools as reverse engineering, it has a much longer-range purpose. He claims that software archaeology is not a means to an end, but rather an effort to learn about software just to understand it. He goes on to describe some projects that exemplify this perception and then discusses the importance of reverse engineering and software archaeology before presenting lessons he has learned along the way.

In the next article, titled "Software Archaeology", Andy Schneider, defines software archaeology as "the systematic study of past software systems by the recovery and examination of remaining material evidence, such as code, test, and design documents." He then focuses on the shortcomings or limitations of the software archaeology and history metaphors. He outlines some principles of history that are applicable to software systems and then uses those principles to identify key techniques, which, he asserts, can assist the inexperienced software archaeologist in their quest. He concludes with a conviction that software archaeologist and software historian are merely roles adopted at various points in time by the software professional in their everyday work, and that they are, therefore, merely useful metaphors for examining best practices and nothing more.

Grace Lewis, Edwin Morris and Dennis Smith, from the Software Engineering Institute (SEI), assert that software archaeology investigates and rehabilitates legacy systems so that their architecture can be discovered and their code reused. Their article titled "Migration of Legacy Components to Service-Oriented Architectures," discusses what it means to create services from legacy components and then summarizes a project where the SEI helped a program office make decisions about migrating legacy components as services within a service-oriented architecture (SOA). They also discuss the Service-Oriented Migration and Reuse Technique (SMART) developed by the SEI as a method for evaluating legacy components for their potential to become services in an SOA.

The last article, an interview with Sellam Ismail, Curator of Software for the Computer History Museum (CHM), presents yet another perspective on software archaeology, focusing on software preservation. The primary mission of the CHM has been to preserve hardware; software was often donated with the hardware but its preservation was an afterthought. Ismail describes what the CHM is now doing to organize and maintain the software they have, and he talks about its uses. While others interested in archaeology focus on studying the artifacts in order to build a better system, or use some of the components elsewhere, the CHM focuses on getting the software to run in its existing state on equipment for which it was built. Much of the value in this software preservation comes from the capability it provides to get at data that was recorded in outdated proprietary formats. CHM gets few requests from software archaeologists seeking to study the software in its holdings. Most of the artifacts are executable code for operating systems and printers and tape readers, etc., not design and requirements documents.

It seems that we are approaching a fork in the road on our software archaeology journey . One camp is looking way down the road and way out in time, seeking to preserve classic software, to leave a historical record for future generations to study, and focus on what artifacts to preserve. The other camp is pragmatic, focusing on tasks that enable them to understand existing software (without regard to age) so that they can use it (or its parts), fix it, or get ideas from it to develop other software. The figure below highlights the characteristics, concerns and focus of these two camps.

Regarding software preservation, the largest obstacle, at the moment, is the lack of a central source code repository. Booch [3] estimates that cumulative source lines of code (SLOC) will reach 750,000,000,000 by the end of 2005. What would it take to store all of that, together with its related artifacts in such as way that it is useful for archaeological digs? Another major obstacle is the lack of a significant user base. Kaplan [4] lays out the frustrating history of software preservation attempts and failures noting, in retrospect, that two fundamental issues could not be satisfactorily addressed.

"First, while all participants agreed that software history is important, that awareness of it should be raised, and that it must be documented, participants simply could not identify a solid user base of any justifiable proportion. Second, as participants stated over and over again, 'preserving software' is much more than an act of accumulation. It means conserving, organizing, researching, cataloging, and presenting materials in ways that researchers can use. To do otherwise is simply hoarding. And no individual institution or consortium of instutions has been able to balance these two issues."

Some people view the Internet Archive as a potential model for establishing software archives. It already stores screen shots of Web sites and other artifacts of the digital age. Adding source code to the mix would be easy enough, says staff software preservationist Simon Carless. Unfortunately, legal issues and aging copy-protection mechanisms make it difficult to provide a decent record of historic programs. Carless says the Digital Millennium Copyright Act (CDMCA) clouds the current preservation landscape [5]. Some authors, including Booch [3], Hunt and Thomas [1], suggest that as developers, we should leave a legacy for future archaeologists (anyone else who needs to examine the software at any point in time) who may some day look at our "polished" code and say "What were they thinking?", by creating those artifacts that future reviewers will need.

Will there be a convergence among the software community about what software archaeology is and is not? Is convergence important or even necessary? Regardless of the answers, and regardless of the age of the software, it is clear that we can effect the outcome of the archaeologist by increasing our awareness of the information needs of those looking at our software in the future, and creating helpful software artifacts when we build our code. However, the ROI of such efforts has yet to be determined.


Ellen Walker, a DACS Analyst, is currently developing a series of publications on software "best practices" as part of the DACS Gold Practice Initiative.
She has spent the past 20 years as a software developer in various roles spanning the entire software life cycle including project management of multiple business process re- engineering efforts within the DoD community.

She is also experienced with assessment initiatives such as the Capability Maturity Model for Software (CMM-SW) and the quality management practices of the New York State Quality Award program.

Ellen has an MS in Management Science (State University of New York (SUNY) at Binghamton), and bachelor degrees in both Computer Science (SUNY - Utica/Rome) and Mathematics (LeMoyne College).

