Tech Views
By Ellen Walker, DACS Analyst
Just what is "Software Archaeology" anyway?
Is it just an analogy? A useful metaphor? A software engineering process? A software engineering discipline? A
technology? Or perhaps, a combination of these things? It appears that the software community has not yet converged
on a singular meaning for the term and its scope as evidenced by the statements of our current authors and others cited
in my references.
Hunt and Thomas [1] claim that archaeology offers a pretty good analogy for the activities that one performs when tasked
with fixing or revising legacy code and, in general, addressing the common problems of trying to understand someone
else's code. The major difference between software archaeology and real archaeology is that the objects of our efforts do
not have to be a thousand years old. They use the terminology of real archaeology to describe useful techniques for
working with someone else's software.
An advertisement appearing on the House of C website [2] is titled "Software Archaeology" and contains the following
text, "We specialize in making things work, particularly integrating old data with new interfaces. It doesn't matter how
old, creaky, and poorly documented the product or system you're fighting with - we can figure it out."
Booch [3] defines software archaeology as "the recovery of essential details about an existing system sufficient to reason
about, fix, adapt, modify, harvest, and use that system itself or its parts."
The authors featured in this issue of Software Tech News offer additional, yet differing, perspectives on the subject. In the
first article titled "Reverse Engineering and Software Archaeology", Ralph Johnson (co-author of the seminal book, Design
Patterns:Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995) asserts that "software archaeology is a
point of view, not a set of technologies" and that, although it uses the same tools as reverse engineering, it has a much
longer-range purpose. He claims that software archaeology is not a means to an end, but rather an effort to learn about
software just to understand it. He goes on to describe some projects that exemplify this perception and then discusses
the importance of reverse engineering and software archaeology before presenting lessons he has learned along the way.
In the next article, titled "Software Archaeology", Andy Schneider, defines software archaeology as "the systematic study
of past software systems by the recovery and examination of remaining material evidence, such as code, test, and design
documents." He then focuses on the shortcomings or limitations of the software archaeology and history metaphors. He
outlines some principles of history that are applicable to software systems and then uses those principles to identify key
techniques, which, he asserts, can assist the inexperienced software archaeologist in their quest. He concludes with a
conviction that software archaeologist and software historian are merely roles adopted at various points in time by the
software professional in their everyday work, and that they are, therefore, merely useful metaphors for examining best
practices and nothing more.
Grace Lewis, Edwin Morris and Dennis Smith, from the Software Engineering Institute (SEI), assert that software
archaeology investigates and rehabilitates legacy systems so that their architecture can be discovered and their code
reused. Their article titled "Migration of Legacy Components to Service-Oriented Architectures," discusses what it means
to create services from legacy components and then summarizes a project where the SEI helped a program office make
decisions about migrating legacy components as services within a service-oriented architecture (SOA). They also discuss
the Service-Oriented Migration and Reuse Technique (SMART) developed by the SEI as a method for evaluating legacy
components for their potential to become services in an SOA.
The last article, an interview with Sellam Ismail, Curator of Software for the Computer History Museum (CHM), presents
yet another perspective on software archaeology, focusing on software preservation. The primary mission of the CHM
has been to preserve hardware; software was often donated with the hardware but its preservation was an afterthought.
Ismail describes what the CHM is now doing to organize and maintain the software they have, and he talks about its uses.
While others interested in archaeology focus on studying the artifacts in order to build a better system, or use some of the
components elsewhere, the CHM focuses on getting the software to run in its existing state on equipment for which it was
built. Much of the value in this software preservation comes from the capability it provides to get at data that was
recorded in outdated proprietary formats. CHM gets few requests from software archaeologists seeking to study the
software in its holdings. Most of the artifacts are executable code for operating systems and printers and tape readers,
etc., not design and requirements documents.
It seems that we are approaching a fork in the road on our software archaeology journey . One camp is looking way
down the road and way out in time, seeking to preserve classic software, to leave a historical record for future generations
to study, and focus on what artifacts to preserve. The other camp is pragmatic, focusing on tasks that enable them to
understand existing software (without regard to age) so that they can use it (or its parts), fix it, or get ideas from it to
develop other software. The figure below highlights the characteristics, concerns and focus of these two camps.
Regarding software preservation, the largest obstacle, at the moment, is the lack of a central source code repository.
Booch [3] estimates that cumulative source lines of code (SLOC) will reach 750,000,000,000 by the end of 2005. What
would it take to store all of that, together with its related artifacts in such as way that it is useful for archaeological digs?
Another major obstacle is the lack of a significant user base. Kaplan [4] lays out the frustrating history of software
preservation attempts and failures noting, in retrospect, that two fundamental issues could not be satisfactorily addressed.
"First, while all participants agreed that software history is important, that awareness of it should be raised, and that it must be documented,
participants simply could not identify a solid user base of any justifiable proportion. Second, as participants stated over and over again,
'preserving software' is much more than an act of accumulation. It means conserving, organizing, researching, cataloging, and presenting
materials in ways that researchers can use. To do otherwise is simply hoarding. And no individual institution or consortium of instutions has
been able to balance these two issues."
Some people view the Internet Archive as a potential model for establishing software archives. It already stores screen
shots of Web sites and other artifacts of the digital age. Adding source code to the mix would be easy enough, says staff
software preservationist Simon Carless. Unfortunately, legal issues and aging copy-protection mechanisms make it
difficult to provide a decent record of historic programs. Carless says the Digital Millennium Copyright Act (CDMCA)
clouds the current preservation landscape [5].
Some authors, including Booch [3], Hunt and Thomas [1], suggest that as developers, we should leave a legacy for future
archaeologists (anyone else who needs to examine the software at any point in time) who may some day look at our
"polished" code and say "What were they thinking?", by creating those artifacts that future reviewers will need.
Will there be a convergence among the software community about what software archaeology is and is not? Is
convergence important or even necessary? Regardless of the answers, and regardless of the age of the software, it is
clear that we can effect the outcome of the archaeologist by increasing our awareness of the information needs of those
looking at our software in the future, and creating helpful software artifacts when we build our code. However, the ROI of
such efforts has yet to be determined.
References
[1] Hunt, Andy and Thomas, Dave, "Software Archaeology", IEEE Software, March/April 2002, Pages 22-24.
[2] House of C web site: http://www.houseofc.com/archaeology.shtml
[3] Booch, Grady, "Software Archeology", a presentation given at the Rational Users Conference, 2004.
http://www.booch.com/architecture/blog/artifacts/Software%20Archeology.ppt
[4] Kaplan, Elisabeth, "A Response to 'Preserving Software: Why and How' ", Iterations - An Interdisciplinary Journal of Software History, Charles Babbage Institute, 13 September 2002.
http://www.cbi.umn.edu/iterations/kaplan.html
[5] Williams, Sam, "Prowling the Ruins of Ancient Software", Salon.com Technology Website, 30 July 2003.
http://www.salon.com/tech/feature/2003/07/30/software_archaeology/print.html
About the Author
Ellen Walker, a DACS Analyst, is currently developing a series of publications on software "best practices" as
part of the DACS Gold Practice Initiative. She has spent the past 20 years as a software developer in various
roles spanning the entire software life cycle including project management of multiple business process re-
engineering efforts within the DoD community.
She is also experienced with assessment initiatives such as the
Capability Maturity Model for Software (CMM-SW) and the quality management practices of the New York
State Quality Award program.
Ellen has an MS in Management Science (State University of New York
(SUNY) at Binghamton), and bachelor degrees in both Computer Science (SUNY - Utica/Rome) and
Mathematics (LeMoyne College).
|
October 2005
Vol. 8, Number 3
|
Software Archaeology
|
Articles in this issue:
Tech Views
Reverse Engineering and Software Archaeology
Software Archaeology
Migration of Legacy Components to Service-Oriented Architectures
Software Preservation at the Computer History Museum
|
Download this issue (PDF)
Receive the Software Tech News
|
|
|