Software Preservation at the Computer History Museum
http://www.computerhistory.org/
Interview with Sellam Ismail, Curator of Software
This article is a synopsis of several email and phone conversations, occurring in
May and June of 2005, between Sellam Ismail, Curator of Software for the
Computer History Museum (CHM) and Ellen Walker, DACS Analyst. It has been
organized into a series of Questions and Answers for your reading convenience.
The questions are not in any particular order.
1- What, if anything, does the CHM have to do with Software Archaeology?
Software archaeology involves digging into an existing (often perceived as
ancient) code base to recover understanding of algorithms, but if the code
is trapped on outmoded computer media, for which the means to read it is
no longer available, the digging must begin at a deeper level, where the
code being investigated is buried on old disks, tapes, or even punched
cards. These old forms of storage are like a tomb for the program that lay
within, and the first step in gaining access is solving the riddle of how to
read them.
The CHM maintains a collection of vintage computers and related
hardware. As a historical computer consultant specializing in vintage
computing technology, I am often called upon to read programs and data
from old computer media that runs the gamut from punched cards to
paper tape to strange and bizarre tape
UNIVAC Punched Tape (1960s)
Computer control instructions are contained on punched paper tape from
an early UNIVAC plant in Utica, NY. Try to imagine what the debugging
process would be for this code. How would one proceed to dig into it?.
How would the process differ from your current process? The image is
provided by courtesy of the Department of Information Systems, London
School of Economics website (see http://is.lse.ac.uk/History/UNIVAC-
PunchedTape.htm).
formats that many people today either haven't even heard of or hardly
remember. Each job I get is an interesting challenge, often times requiring
many hours of investigative work to determine the format of the media and
to piece together a functional system capable of recovering the bits. For
this, I draw upon my vast collection of vintage computers: over 2,000 with
the oldest being an original PDP-8 from 1965.
Most recently I was called upon to recover actual archaeological data from
a set of VHS tapes at the Mel Fisher Museum in Key West, Florida. In the
late 1970s, a company called Alpha Microsystems (one of the first
microcomputer companies which is still in business today) pioneered a
system for storing data on standard VHS tapes using ordinary video
cassette recorders. At that time, the system was deemed to be a practical
and elegant solution to the problem of backing up entire hard drives, which
were then counted in the single megabytes but were getting bigger by
leaps and bounds each year (much like our current trend).
When Mel Fisher discovered the Atocha treasure ship in 1986, which sunk
off the Florida Keys in 1622, one of his first priorities was to make a proper
record of the finds pulled up from the wreck. The sheer magnitude of the
motherlode (for example, there were over 100,000 silver coins recovered)
required a flexible and efficient solution for documenting and cataloguing
each artifact. Fisher hired a computer consultant who designed a system
to digitally photograph each coin so that a visual record could be made.
The photos were taken hundreds at a time and stored to the hard disk of
the digital camera station. Once the hard drive was full, the processed
photos were backed up to VHS tapes, the hard drive was cleared, and the
next batch was processed. The result was over 150 tapes consisting of
tens of thousands of digital photographs. The tapes were then stored
away for safe keeping and quietly forgotten.
The Mel Fisher Museum recently re-discovered the tapes (now fifteen
years after their creation) and realized they had no way to read them. The
equipment to process the tapes had long since vanished. Worse yet, the
tapes held the only photographs of the silver coins that were pulled from
beneath the ocean. My firm was hired to recover the data from the tapes.
After 4 years and thousands of dollars in effort, we were able to track
down and assemble the necessary hardware and software to read these
tapes and convert the images to a modern graphics format (they were
stored in a proprietary 16-level grayscale). The project involved dozens of
hours of searching for interface cards, special VCRs, old software, ancient
versions of DOS, and properly antiquated (i.e. slow) PCs to make
everything work.
Unfortunately, this situation is quite common. Organizations have
historically not considered the ramifications that the obsolescence of
computer media can have. Before it is realized, media that holds perhaps
thousands of man-hours of computer code and data can be put at risk
when the last unit of a particular Zip drive leaves the assembly line. The
issue has plagued government and private sector entities for decades.
Computer technology advances so quickly that computer media can be
outmoded suddenly and without warning. An organization that does not
have a proper plan for the obsolescence of its data stores is one day
going to face the same problem.
Through its acquisition of outdated hardware and software, the CHM is
providing linkage to computer technologies of the past. The scope of
usage of our computer artifacts, including software, can be whatever we,
the community of software archaeologists, want it to be.
2 - What is the Museum currently doing regarding the collection and
cataloguing of software?
In the past, the Museum tended to collect software as an afterthought.
Software would usually come in as part of a hardware donation. As such,
not much discretion was used in determining what software artifacts the
Museum should be accepting, and as a result we ended up with a lot of
incredible artifacts (such as the operating system and programs for the
MIT Whirlwind) as well as a lot of rubbish, such as entirely non-interesting
driver disks and random media with unknown stuff on it. Some of this
"unknown stuff" may yet prove to be very historically significant, but
without the proper context to go along with it, we will have to do a lot of
investigating to separate the wheat from the chaff.
Fortunately, the Museum recognized this deficit in properly collecting
software and created the position of Software Curator. I was tapped to fill
the position and have since been establishing collecting guidelines for
software as well as building out the infrastructure for properly maintaining
the software collection, including physical storage, cataloguing, and
access.
To that end, I first started by organizing the Museum's existing holdings.
We have two rooms devoted exclusively to software. I've set up the
necessary shelving and have arranged the various software artifacts in a
structured manner to make the cataloguing process more efficient. I
added the proper facilities for a Software Collection Catalog to the
Museum's database, defining the fields and developing a data dictionary
to use as reference for populating each record. I also developed a
Software Collection Taxonomy, which ultimately serves the purpose of
assisting researches in finding the type of software they are looking for.
By using the taxonomy to categorize software, researchers can, for
instance, request all titles in our collection that have something to do with
spreadsheets, and from there they can perform a more refined search.
We've begun the cataloguing process and plan to have it completed by the
end of the summer. We have over a thousand packaged software titles
(e.g. commercial software), and thousands of other artifacts including
software on common media such punched cards, paper tape, magnetic
tape, floppy disk, etc., and on more exotic media such as magnetic film
and program rods (steel rods with holes drilled into them to represent bits).
Our software collection also includes source listings on paper. We have
software going back to the early 1950s and as late as from within the last
several years.
I have initiated a project (outside of the CHM) called FutureKeep to develop a
universal media imaging format so that the data on media of any type can be described and encoded digitally in a manner that will
allow the original data image to be reconstructed at a future date if the need or desire ever arises.
It is also being designed to serve as a universal image format for simulators. The specification for this format also takes into
account the quickly advancing nature of computing and storage technology and the fact that media of today will be outmoded and difficult to source in mere
months or years or, at most decades. The intent is to create archives that will (hopefully) withstand the test of time and be readable and useable centuries from now.
3 - Can you talk about your plans for preserving the software, and the
reality of where you are with it right now?
As for preservation of the software (i.e. the code itself), this is a major
undertaking and we are currently studying the issues.
In the meantime, I am currently formulating a plan for the creation of a
"transcoding" lab at the Museum. We've settled on the term "transcoding"
to describe the process of extracting information from one medium and
storing it on another, in this case a centralized server where all the bits
can be conveniently managed. The lab will contain all the hardware
needed to read all the various media we have in our software collection.
We will begin methodically transcoding all the media in our software
archive as soon as the lab is ready for action. This cannot come soon
enough as we are in somewhat of a race against time with a lot of the
artifacts (i.e. the physical media) in the software collection, some of which
are stored on disks and tapes which are either at or well beyond their
theoretical lifespan (though I should add in practice we find that magnetic
media is more durable than once predicted) and some of which cannot be
currently read by conventional means.
4 - How does the storage media impact the preservation of your software?
We must keep in mind that no one truly knows just how long magnetic
media will really last, just like we don't truly know how long CD or DVD
media will really last as the technology is still relatively too new to have
data from the real world. CD media is thought to have a lifespan of 100
years, but those estimates are based on accelerated testing, and they
certainly don't apply to the cheap commodity CD-R media you buy off the
shelf today (which can last anywhere from years to seconds). Floppy
disks were thought to have a lifespan of 15-20 years, but I am finding that
disks even 30 years old still read just fine, while 3.5" disks manufactured
in the late 1990s die between the time it takes me to copy a file to one of
them and then walk over to a PC to which I'm trying to transfer that file. A
big factor is the quality of the manufacturing process of the media and,
having studied this issue, I always recommend that people research the
media they are buying to store data for the long-term as they may have a
rude awakening in the not too distant future.
I have initiated a project (outside of the CHM) called FutureKeep to
develop a universal media imaging format so that the data on media of
any type can be described and encoded digitally in a manner that will
allow the original data image to be reconstructed at a future date if the
need or desire ever arises. It is also being designed to serve as a
universal image format for simulators. The specification for this format
also takes into account the quickly advancing nature of computing and
storage technology and the fact that media of today will be outmoded and
difficult to source in mere months or years or, at most decades. The intent
is to create archives that will (hopefully) withstand the test of time and be
readable and useable centuries from now.
At the CHM we will record the images and store them to hard drives on
our server in an ad hoc format for now, but we will eventually want to
encode everything in a uniform structured image format.
5 - How are you addressing copyright issues and the proprietary nature of
items in your collection?
We have taken the first steps towards developing an access policy for
our potential audience of researchers and hobbyists. This policy will take
into account the fact that some of our software artifacts are considered
sensitive or are proprietary and could be used for competitive advantage.
The issue of copyrights is a big can of worms wriggling and writhing about,
waiting for someone to come along and open it up and uncover the icky
sliminess contained within. We, of course, have to be very sensitive about
these issues and, to that end, we have fields in our database that flag
certain artifacts as being under embargo or not for general release. The
proprietary nature of some of our artifacts is a relatively easy issue to deal
with, but the copyright issues are a far greater concern and, given the
direction these issues are currently heading, we may well have to keep
some of our digital software artifacts locked down for a good long time to
come, which we feel would be a loss to society. In fact, the way copyright
law in the United States currently stands, we may even be breaking the
law if we circumvent copy protection mechanisms on old, obsolete and no
longer published software in order to archive it. This is an issue on which
the Museum has commented to the US Copyright Office. We are hoping
to get a permanent waiver of the Digital Millennium Copyright Act (DMCA)
for the Museum and similar institutions so that preservation efforts can be
exempted from the DMCA. This is an unfortunate example of where the
provisions of the DMCA clearly fall short of what they were trying to
accomplish, and indicates a lack of foresight by the drafters of that law.
6 - How do you decide what software artifacts to keep?
One of the more important tasks I've completed since joining the staff of
the Museum is developing a Software Selection Criteria document to
guide the collections department in deciding what software donation offers
to accept or decline. As I mentioned previously, most software typically
comes in as a part of physical artifact donation and would automatically be
moved to the software room and placed on a shelf. Now, all software is
vetted against the Software Selection Criteria to ensure the software has
certain historic characteristics that make it a good candidate for long-term
preservation. Proper historical preservation for artifacts of any kind is an
intensive task that requires lots of resources, both human and otherwise,
so the intent of the Software Selection Criteria is to make sure the
software we are accepting into the collection is worthy of that expenditure
of resources. The criteria address various purposes or value that a
software artifact might confer to the Museum, such as the obvious
historical merit, or assisting in curatorial efforts (i.e. something useful in
completing a collection or including in exhibits), and so on.
Software Selection Criteria
Must meet one or more, preferably two, of the following conditions:
- 1. Sold a significant number of copies or had a large install base.
- 2. Serves to demonstrate a significant and colossal failure.
- 3. Introduced a new paradigm, product family, or launched a new industry.
- 4. Developed using a new and significant software development methodology.
- 5. Supports existing Museum artifacts.
- 6. The underlying code itself has qualities of merit worth preserving.
- 7. The software was utilized in something of historical import.
- 8. Sufficiently antiquated, i.e. 1960s and earlier.
7 - How is hardware preservation different from software preservation? To
what extent does software preservation depend on hardware preservation?
Can the two be separated?
The main differences are the size of the artifacts and the resources
required to manage them. Hardware, especially older computers from the
Paleolithic era of computing (i.e. the 1950s), requires lots of space and
therefore funding to acquire and store. Software, on the other hand, takes
up much less space, and once the bits are safely rendered in a digital
format, the original media could theoretically be tossed, though for the
time being, the Museum's policy is to retain the original media as the
ultimate archival medium, at least until it proves to be utterly useless in
holding data (i.e. magnetic tapes in which the magnetic coating is flaking
off the base substrate). Even then, of course, the original media may
retain some value as cultural artifacts in their own right. This is a
consideration that has been especially promoted, for example, by the
Smithsonian Institution.
As far as hardware and software as individual artifacts, each cannot be
properly understood without the other. True, you can successfully display
a computer that is just sitting there powered down and not being anything
other than a hulk of metal and plastic, but it is not really telling the whole
story. On the other hand, how does one present software without actually
executing it on its native hardware, or in the very least under simulation?
It's an interesting conundrum, especially when one considers the effort
required to resurrect a decades old computer system. The Museum has
so far restored to working order an IBM 1620, a DEC PDP-1, and an IBM
1401 is currently undergoing restoration. However, without software, the
effort that went into the restoration of these machines would be pointless.
It is only when you sprinkle in the magic of software that the machine does
anything particularly interesting or useful. Software is indeed the soul of
the computer.
With regard to preservation, hardware is needed to preserve software only
to the extent that the bits must be recovered from the original media
before the media lose their ability to retain information. And while it's nice
to be able to run software on its native hardware, it is not always practical
to do so. Machine restorations take lots of time and effort by highly skilled
individuals, not to mention a significant amount of money (the IBM 1401
restoration has so far consumed several thousand dollars, the majority of
which was kindly donated by Museum supporters). On the other hand,
once we have the software in a modern digital format, we can always
utilize simulators to recreate the effective feel of operating historic
software. Some simulators try to recreate the full experience, with the
recorded sounds of whirring disk or tape drives and teletypes pecking
away, and while it's a far cry from the experience of running Spacewar! on
an actual PDP-1 and watching the front panel lights blink away, it's better
than not having any experience. I imagine at some point in the future we'll
be able to synthesize a more realistic experience, say something like what
they have in the imaginary Holodeck in the Star Trek series, but for now
we have to mostly rely on simulators for historic computers from the 1950s
and 1960s. Fortunately, we can still enjoy the wealth of minicomputer and
microcomputer software on native platforms since these classes of
machines from the early 1970s onwards are relatively easy to setup and
maintain in working condition. However, they too will someday be
impractical to run, if only because their mechanical parts get worn out, or
components die, or bits in ROM fade away.
8 - How is the software that you have preserved currently being used and
by whom (what types of patrons)?
Currently, we don't get many requests from outside our organization to
access the software, but of the few requests we do get, most are from
attorneys seeking specific titles as prior art for patent infringement
lawsuits. We do, however, utilize assets in our software archive internally
to support exhibits we set up and also, to give our volunteer restoration
teams something to run on the machines they resurrect. On request, we
provide the researcher access to the physical media itself, and we would
assist them in actually retrieving or executing the software on either
suitable hardware or, if that's not possible or practical, on simulators.
9 - What is your plan for preventing your software holdings from becoming
obsolete as technology whizzes by? What is the cost that you incur to record the history of computing?
We are establishing tools, procedures and guidelines that we hope have
fully taken into account all the various and germane parameters to ensure
that our efforts remain relevant for some time to come. Of course, these
tools, procedures and guidelines will require periodic revamping to keep
up with changing technology and evolving methods of software storage
and preservation, but we believe we have started off with a very good
precedent that should carry us through at least the next five years and
give us time to evaluate and plan for the next step.
|
October 2005
Vol. 8, Number 3
|
Software Archaeology
|
Articles in this issue:
Tech Views
Reverse Engineering and Software Archaeology
Software Archaeology
Migration of Legacy Components to Service-Oriented Architectures
Software Preservation at the Computer History Museum
|
Download this issue (PDF)
Receive the Software Tech News
|
|
|