Software Archaeology

By Andy Schneider, Lead Integration Architect, for BP's Oil Trading & Supply SystemsBP Plc. and Pete Windle, Consultant, BJSS Ltd.

For us to discuss software archaeology productively we must first define it. A naive definition can be derived thus: softáware aráchaeáoláoágy Pronunciation Key (sôftwâr ärk-l-j)

The systematic study of past software systems by the recovery and examination of remaining material evidence, such as code, tests and design documentation.

Using archaeology as a metaphor allows us to reason about how we recover and examine material evidence relating to software projects. As useful as this is, it is by no means the entire story. Many times there exist individuals to interview, and primary and secondary evidence to investigate. These types of activities owe more to historical practice than they do to archaeology. One of the reasons for this is that existing communities in organizations ensure that the investigation of systems is as much about people, existing practice and precedent as it is about tools and artifacts 'dug out of the ground'. This emphasis on people and historical practice is less evident in existing software archaeology literature.

As an illustration, consider the UK constitution. The UK constitution defines the form, structure, activities, character, and fundamental principles by which UK society, law making institutions and government institutions operate. Famously, though, it is not written down. Instead it consists of a number of bodies, namely the monarchy, the executive, parliament and the civil service, supported by both precedent and law. A student of the UK constitution is rather like someone trying to understand a software system. The student has to talk to experts, study paperwork, identify 'urban myths', analyze past behaviours and cope with the fact that, whilst they investigate it, the constitution is changing. The authors find this process, the process of historical study, to be as compelling an analogy as archaeology when considering practical techniques for studying software systems. With this in mind, this paper will take a more historical and people orientated perspective. Firstly, we will outline some of the limitations of the archaeology and historical metaphors. We will then go on to outline certain principles of history, as applicable to software systems, and then use these principles to drive out key techniques for understanding systems. Hopefully we will leave you, the reader, with practical techniques and another way of thinking about understanding systems.

It is important that any metaphor is seen for what it is - a starting point or a framework for describing the properties of the object of interest. It is not to be used as a reference or to define truth. One can say that a hierarchical software structure is like the branches of a tree; one cannot then infer that software is made of wood. Fowler goes into this in some detail[3].

So then, what are the limitations of our archaeological and historical metaphors?

Historians subdivide evidence into primary and secondary sources. Primary sources can be material things such as artifacts, tools, constructions or remains (although in software, "material" can include intangibles - source and object code, log files, etc). Alternatively, they could be contemporaneous written records - documents, specifications, e-mails. Finally, interviews conducted with the protagonists or direct witnesses to events are themselves primary sources. The oral history of software teams tends to be a very rich source of information (albeit some of it apocryphal or cargo cult[2] invocation). This is for several reasons, but mainly because most software projects are delivered with a number of undocumented deltas to the original requirements and/or design.

Cargo cult science ia a term used by Richard Feynman in his 1974 Caltech http://en.wikipedia.org/wiki/Caltech/ commencment address to describe work that has the sembelence of being scientific, but is missing "a kind of scientific integrty, a principle of scientific thought that corresponds to a kind of utter honesty". Feynman cautioned that to avoid becoming cargo cult scientists, researchers must first of all avoid fooling themselves, being willing to question and dout their own results, and to investigate possible flaws in a theory or an experiment (from http://en.wikipedia.org/wiki/Cargo_cult_science).

Secondary sources are texts and documents which are derivative of the actual event - not actually part of the phenomenon but later analysis or interpretation of it. As we will discuss later, the production of secondary sources is an important part of the software historian's milieu. When considering these sources it is often useful to evaluate their quality before spending significant time reading them. Some questions a good historian may ask when judging software related sources would be:

A good historian recognises that the past is different from the present. This means avoiding researching a system with the aim of reinforcing existing expectations but rather, allowing the system to teach you how it really is. For example, when encountering a method called createInstance you may assume from the name that it creates instances of some structure or object and therefore pass it by. However, the method could be doing anything. There are examples of poor naming all over the IT world. Worse, by passing over the method you may have missed a lesson; that 'instance' in this system refers to some specific concept (such as a point in time) rather than the meaning you have ascribed to it.

Be conscious of signifiers

In semiotics [5], people distinguish between the signified, the concept being communicated and the signifier, the token that represents the concept. The signifier and the signified together are known as a sign. The much used example of this is the White Hat in old cowboy movies. Viewers of these movies know that the White Hat stands for the Good Guy. If you are not familiar with this sign, or apply it outside of the genre then you may miss something or worse, misunderstand the communication. Pattern names are signifiers and so are software terms such as 'instance'. So, be conscious that sets of signs used in the past may not be the same as those you are using. Or, put more bluntly, 'assumption is the mother of all screw-ups'.

With the best will in the world, you cannot hope to know all there is to know about a system and its past, so it is worth remembering that the past you see is only part of the picture. Consider the functionality someone developed in the past. Ask yourself "was that the functionality that was intended?" - moreover, ask yourself "was that the functionality that was actually wanted?" If you assume that the code correctly implements the required functionality then you may be missing a bug or a requirements gap.

It doesn't always do what is says on the tin

On encountering some code written in the past, seek to both understand the code and determine if the code correctly implements the functionality. If you cannot determine the latter, be sure to explicitly recognise that fact. Sources of functional definition (apart from a requirements document) are often the support team or the users.

The functionality of the system you are studying is and has been part of a wider system, combining software, users and support (at a minimum). Good historians always consider multiple perspectives and contexts before drawing conclusions.

It isn't just the code

When you are examining functionality in the system, and applying 'It doesn't always do what it says on the tin', you may feel the urge to fix the problem. Before you do this, determine if the wider system relies on the 'error' discovered in the software system for correct operation.

The past is not static. When seeking to understand a system, recognise that you can both examine it now, at this instance, but you can also examine it over a series of points in time. A historian will always look at what came before and after when assessing a particular point in time.

Points in time

Gather metrics that vary over time to obtain an understanding of the evolution of the system. Correlate specific events in the results with any spoken or written history of the project to enrich your understand of why things have evolved the way they did.

Examining a system is inextricably linked with gaining familiarity with the group of people who create (or created), use and support the system. There is much to be gained from examining some of the popular ethnography literature (e.g. [6][7][8]). There isn't space in this article to spend much time on the subject, but we have included a limited number of practices to ensure a reasonable breadth of coverage. Many cultures deny that an outsider can ever understand them. They maintain that their treasure is only for members; this is particularly the case in some tribes. Recognise that groups that form around legacy systems can exhibit many tribal characteristics. Understand that whilst the tribe may be the final authority on their perceptions and the terms they use, the past and its evolution are open for you to look into. Because of the tribal nature, it is important to use their language, to be initiated and to adopt their customs - without becoming pickled (i.e. 'going native') or failing to question the customs.

Speak the lingo

Paying attention to the terminology used by existing legacy teams allows you to communicate with them on their terms. This helps communication, improves rapport and shows you are actively listening.

Each group of people will have their own interpretation of history; listening to only one (e.g. the development team) can result in inaccurate bias creeping in to your research process.

Seek Multiple Sources

Acquire multiple perspectives on the system to reduce the sensitivity of your finding to bias.

You can extend this practice to observing how conversations with individuals vary over time. For example, one of the authors was interviewing someone on a legacy team about the deployment process. The process sounded incredibly complex and the reasons for the complexity seemed valid, but somewhat tortuous. The author went back for repeated conversations and noticed as time went on that the explanations changed or simplified. This indicated that there was more to the story than was being told. The differences pointed to where to explore and the author changed his tack as a result.

Question Customs

Responses to questions of the form 'well, it just works that way' are often clues of where to spend your time when examining a system. These responses may point to Cargo Cult[2]. Identifying these misconceptions can be useful pointers to fragile or complex areas of the system that need understanding well.

Cultures have a strong interest in their own pedigree. This can result in cultural chauvinism, where certain groups compete to show they are the 'oldest', 'fastest', 'youngest' etc. To do this they will often, sometimes without knowing it, manipulate history - to show things in a certain light. It is important therefore to rely not only on artifacts from a team, but also actively engage with external stakeholders to gather data. For example, if you are interested in performance, ask the support team; measure the response times, do not just rely on anecdotes from the team. See Multiple Sources above.

Gather Objective Data

Where possible gather quantifiable data to validate hearsay. E.g. use profile tools to determine where the bottleneck really is.

There are several different techniques for documenting the past. This section details some of the forms of output and methods of production that the authors have found useful.

Update and Extend the Existing Literature

This is generally not a valid option in historical circles - a historian who merely rewrites other people's books to "correct" interpretations would not be seen as contributing much. When understanding software systems, however, there is usually the potential to at least use existing documentation to bootstrap your work. Catalogue existing design and requirements documentation. Do not treat it as gospel truth - consider potential levels of accuracy, bias and how up-to-date it has been kept. Consider which parts are still useful and which approaches are worth extending. It may be that a slash and burn strategy is required should the existing documentation be woefully inadequate or glaringly inaccurate. Not every document contains enough value to be worth recovering. Conversely, it may be that although inaccurate in places, some existing documentation is "good enough" - it is still worth briefly recording where it is not, however.

Start With A Clean Sheet

In a way, the mirror image of the previous point. When approaching a large and unknown architecture with many moving parts, it is generally best to start with a blank sheet of paper. This can then be iteratively filled in as your understanding expands - the blanks within the diagram then become a map of where further investigation is required. On an old map of the world, these places would be marked "Here Be Dragons", perhaps. Good diagrams for forming these high-level views include: _ Activity diagrams - especially for core transaction pipelines. _ Sequence diagrams - for understanding interactions between object graphs. _ Package and/or class diagrams - for establishing an idea of structure.

Don't Trust Their Focus

The subjects that the original authors of sources considered interesting may not be the events that you are interested in now. A corollary of "The Past Is Not The Present". The low-level concerns of the parish council rarely make front page news, but may well generate screeds of text. In context, then, the original documentation may wax lyrical at some length about their optimistic locking model. Providing that it works, you may never need to touch it and therefore understanding the intimate detail may be a waste of your time. Of course, this is providing that it works. The key has to be to understand your own requirements at that moment, and search out the documentation that meets them.

Make a Timeline

Draw a history timeline showing significant events that occurred during the lifetime of the development and use this to remind you of the context within which people worked. This helps because it forces you to think about why things happened and when they happened - it's a framework for thinking about the past in a dynamic fashion.

Know what you don't know.

Documenting what you don't know allows people who follow you to determine where they need to spend their time when investigating the system and to determine between validated fact and hypothesis.

The authors have looked at applying some of the principles of history to the study of existing software systems, and have tried to outline some practices to assist the student of those systems. Whilst the archaeological metaphor looks initially compelling, we believe that the essentially human nature of software and its comparatively contemporaneous nature make the history metaphor a more interesting fit. Finally, it should be stressed that both the software archaeologist and software historian are merely roles that are adopted at one time or another by the software professional throughout the course of their everyday work, and should not be taken as anything more than useful metaphors for examining best practises in the field.

[4]M A Jackson, Problem Frames: Analysing and Structuring Software Development Problems (2001)

[9][6] Barley, Nigel. The Innocent Anthropologies: Notes from a Mud Hut. Penguin, 1986.

[10][7] Hammersley, Martin & Paul Atkinson. Ethnography: Principles in Practice. Routledge, 1994.

[11][8] Agar, Michael. The Professional Stranger: Information Introduction to Ethnography. Academic Press, 1996.

[10] Seacord, Robert, Daniel Plakosh & Grace Lewis. Modernizing Legacy Systems. Addison Wesley, 2003.

[11] Demeyer, Serge, Stephane Ducasse & Oscar Nierstrasz. Object Orientated Reengineering Patterns. Morgan Kaufmann, 2002.

Andy Schneider is the Lead Integration Architect for BP's Oil Trading & Supply systems. Andy is an industry exponent of agile development techniques with over 15 years of relevant experience in the IT industry. He has extensive experience in application and systems architecture, project management and software delivery. Andy regularly publishes papers and presents on subjects such as technical leadership and systems development at conferences such as OOPSLA and SPA.' Email: Andy Schneider [[email protected]]

Pete Windle is a technical project delivery specialist, consulting for BJSS (http://www.bjss.co.uk). He lives in a software development commune in Islington, London." Email: Pete Windle [[email protected]]