Tikalon Header Blog Logo

Archival Data Storage

April 13, 2020

Perhaps motivated by a fear that Americans were generally uncultured, elementary school students of my generation were subjected to a plethora of poetry. One of these poems was Ozymandias, a sonnet written by Percy Bysshe Shelley (1792–1822) in 1818. As I've remarked in earlier articles, education is often stifled by the need to use free, public domain, materials, and an 1818 sonnet by a poet who died in 1822 definitely falls into that domain.

Ozymandias was the Greek name for Egyptian pharaoh, Ramesses II, who ruled Egypt from 1279-1213 BC. This sonnet tells the tale of a traveler who finds the ruins of a statue of which just the legs remain. Inscribed in the pedestal are the words, "My name is Ozymandias, king of kings: Look on my works, ye Mighty, and despair!" The sonnet demonstrates the futility of the pharaoh's hubris, and it also shows that humans can create a message, the stone inscription, that lasts three thousand years.

Both sides of the Phaistos Disk

The Phaistos Disk is a fired clay disk about six inches in diameter that's about 3,500 years old. It's completely covered on both sides with stamped symbols arrayed in a spiral from center to edge, and it was found in the Minoan palace of Phaistos. No other examples of the Phaistos script have been found, so it's unlikely that a translation can be made.

(photo by Maksim)


While stone tablets have permanence over the course of millennia, they can't contain much information. The Rosetta Stone, created about 200 BC, was the key to decipherment of ancient Egyptian scripts, since it contained a text in Greek translation. The Greek text of the Rosetta Stone has about 10,000 characters. Compare this to the contents of the English version of Wikipedia at more than 10 gigabytes, which is about 10 million times greater.

Eighty column punch-card

Stone tablet of the computer age, the eighty column punch-card. It's likely that there are boxes of these that are still readable after more than half a century. Punch-cards were once so common that people would make Christmas wreaths from them. (Wikimedia Commons image by Arnold Reinhold.)


While the Zip disks containing decades of my email messages are long gone and never missed, there are some things that people would like to keep for a very long time, such as family photographs. I have photos of my great grandparents that are a century old and still about the same quality as the time they were made. Digital photographss made today, although of much higher initial quality and much easier to create, might not survive even a decade without proper care. Aside from the problem of finding the right media player (remember floppy disks?), there's the problem of bit rot.

An eight-inch computer disk

An eight-inch computer disk, essentially unreadable, today, for either lack of a proper disk drive, or decayed data. These usually held about a quarter megabyte of data.

I had many boxes of these in the 1980s, when I had an S-100 bus CP/M computer in my laboratory.

At that time, I did laboratory automation using Forth. I still have fond memories of Forth, although I haven't used it for decades.

(Wikimedia Commons photo by Hannes Grobe/AWI.)


For a time, CDs and DVDs were the preferred archival storage media, but people now save everything on USB flash drives, SD cards, or on one or another "cloud" service. Many people, myself included, reject the idea of cloud storage, the safety of your data is being relinquished to another party. There are numerous examples in which people have lost data through reliance on cloud storage.

The problem with cloud storage is only with the longevity of the provider, not the longevity of the data. Centralized data centers store archives on magnetic tape, generally on tape cassettes. Such tapes have a very low per bit cost, and they are readable after 15-30 years.[2-3] Their data are usually transcribed onto newer tapes every 5-10 years. Floppy disks and diskettes, which store data by the same magnetic principle, have a shorter lifetime, since rubbing erodes the media.

According to the US National Institute of Standards and Technology (NIST), a DVD will retain your data less than fifteen years as a worst case, although CD-R media have about double the life expectancy (see graph).[4-7]

Optical media lifetime.

Optical media lifetime, as determined by the US National Institute of Standards and Technology (NIST). Recordable CDs are more archival than recordable DVDs, principally because the areal data density of a CD is smaller. Aside from the initial quality of the manufactured CD and DVD, data longevity depends on exposure to heat, humidity, and light. Storage conditions and the handling of the media during use are important factors that affect longevity. (Graph rendered from data in ref. 5 using Gnumeric.[5]}


NIST obviously didn't have 45 year old CDs and DVDs for their study; so, how did they did they get their data? Longevity studies like this are done using accelerated-aging experiments that rely on the parameters of a first-principles model of the system. Any memory material will have an energy barrier between its two data states (logical "0" and logical "1"), as shown in the figure, so an Arrhenius law model can be used.

Energy barrier between two states.

Material stability modeled as an energy barrier ΔE between two states, I and II.

(Illustration by by author using Inkscape.)


According to an Arrhenius law model, data is corrupted by thermal fluctuations that randomly push a bit from its intended state to its complement. The probability P for this to occur is an exponential function of temperature:
P = 1 - exp (-t/τ(T)

where

τ(T) = (1/f0) exp (ΔE/(kBT)
In these equations, τ is the decay time, kB is the Boltzmann constant, T is the absolute temperature, and f0 is the attempt frequency. If just a single atom can change the data state, the attempt frequency can be estimated as the atomic vibration frequency, and this can be as large as 1013 Hz.

Based on these equations, you have a million year memory with an energy barrier of 63 kBT when error correction codes are included.[8] If you want a billion year memory, you need to increase the barrier just a bit, to 70 kBT, which is 1.8 eV at room temperature).[8] In 2013, an international team of scientists from The Netherlands and Germany proposed a billion year memory based on a pattern of tungsten embedded in silicon nitride (Si3N4). Such a memory could be read using by imaging, or by interference of an electron beam (see figure).[8]

Line-type archival WORM memory cells.

The tungsten-silicon nitride billion year memory.

There are two possible architectures; transparent (top), in which the substrate is removed, or using interference effects in electron or photon beams.

The creators of this memory caution that "black swan" events would reduce its billion year lifetime. These include "theft, meteor impact or the sun entering the red giant phase."[8]

(Figs. 3 and 4 of ref. 8, via arXiv.[8] Click for larger image.)


There's another method of storing vast quantities of data that's been demonstrated in the past decade; namely, recording the data chemically in DNA. There are about three billion base pairs contained in the 23 chromosome pairs of the human genome, and in 2019 it was reported that the entire 16 GB of the English language Wikipedia has been encoded in DNA. While this is impressive, the longevity of a temperature-sensitive chemical in a test tube is likely far less than the billion year W-Si3N4 memory.

However, DNA is easy to copy, and recent research by a huge international team of scientists from Harvard Medical School (Boston, Massachusetts), the Massachusetts Institute of Technology (Cambridge, Massachusetts), Brandeis University (Waltham, Massachusetts), the Skolkovo Innovation Center (Moscow, Russia), Utrecht University (Utrecht, the Netherlands), and the Tata Institute of Fundamental Research (Bengaluru, India), have increased DNA data permanance by encoding it in living cells of the bacterium, Halobacterium salinarum. Their research is posted in a recent paper on bioRxiv.[10-11]

Halobacterium salinarum, an extraordinarily hardy organism, is a halophile ("salt loving") extremophilic archaeon that's hard to kill.[11] This bacterium has, on average, 25 backup copies of each of its chromosomes.[11] It's resistant to thermal extremes, prolonged vacuum, and ionizing radiation, and it can withstand desiccation while being trapped in brine pockets in salt crystals.[11] This bacterium has been revived from prolonged stasis in hundred million year old salt deposits.[10-11]

Halobacterium salinarum

The proof of the pudding is in the Petri dish. This is Halobacterium salinarum in which some DNA sequences have been modified to contain data. (Fig. 7 of ref. 10, licensed under a Creative Commons license.[10])


The research team encoded the digital specification for creation of 3-dimensional figures into the DNA of these bacteria, and embedded the bacteria into crystalline mineral salts.[10] The authors state that such repositories of biological information can be expected to survive for much longer than humans. The average lifespan of mammalian species is about a million years, and estimates of the longevity of homo sapiens are between 600 to 7.8 million years.[10]

References:

  1. Ancient History Sourcebook: The Rosetta Stone: Translation of the Greek Section, Fordham University.
  2. John W. C. Van Bogart (National Media Lab), "Mag Tape Life Expectancy 10-30 years," Letter to the editor of the Scientific American, March 13, 1995.
  3. S. H. Charap, P. L. Lu, and Y. He, "Thermal stability of recorded information at high densities," IEEE Trans. Magn., vol. 33, no. 1 (January, 1997), pp.978-983.
  4. CD-R and DVD-R RW Longevity Research, US Library of Congress.
  5. Final Report: NIST/Library of Congress (LC) Optical Disc Longevity Study. The LIBRARY of CONGRESS NIST September 2007 (414 kB PDF file).
  6. How Long Can You Store CDs and DVDs and Use Them Again?, Council on Library and Information Resources.
  7. Optical media preservation, Wikipedia.
  8. Jeroen de Vries, Dimitri Schellenberg, Leon Abelmann, Andreas Manz and Miko Elwenspoek, "Towards Gigayear Storage Using a Silicon-Nitride/Tungsten Based Medium," arXiv, October 9, 2013.
  9. Sang Yup Lee, "DNA Data Storage Is Closer Than You Think," Scientific American, July 1, 2019.
  10. J. Davis, A. Bisson-Filho, D. Kadyrov, T. M. De Kort, M. T. Biamonte, M. Thattai, S. Thutupalli, and G. M. Church, "In vivo multi-dimensional information-keeping in Halobacterium salinarum," bioRxiv, February 15, 2020, doi: https://doi.org/10.1101/2020.02.14.949925 .
  11. Steve Nadis, "Hardy microbe's DNA could be a time capsule for the ages," Science, vol. 367, no. 6480 (February 18, 2020), p. 840, doi:10.1126/science.abb3588.

Linked Keywords: Motivation; motivate; fear; Americans; culture; uncultured; elementary school; student; Baby-Boom Generation; my generation; plethora; poetry; Ozymandias; sonnet; Percy Bysshe Shelley (1792–1822); education; public domain; Greek language; Greek name; Ancient Egypt; Egyptian; pharaoh; Ramesses II; Egypt; Anno Domini; BC; ruins; statue; leg; epigraphy; inscribed; pedestal; word; monarch; king; futility; hubris; human; message; millennium; thousand years; Phaistos Disk; pottery firing; fired; clay; disk (mathematics); inch; diameter; stamping (metalworking); stamped; symbol; spiral; center (geometry); edge (geometry); Minoan civilization; palace; Phaistos; translation; Maksim; millennia; Rosetta Stone; decipherment of ancient Egyptian scripts; character (symbol); English version of Wikipedia; gigabyte; stone tablet; information Age; computer age; column; punched card; punch-card; box; century; Christmas wreath; Wikimedia Commons; Arnold Reinhold; Zip drive; Zip disk; decade; email; family; photograph; great grandparent; data quality; digital photograph; digital media player; floppy disk; data degradation; bit rot; eight-inch computer disk; disk storage; disk drive; data degradation; decayed data; megabyte; 1980s; S-100 bus; CP/M; laboratory automation; Forth (programming language); compact disc; CD; DVD; archive; archival; USB flash drive; secure digital; SD card; cloud computing; longevity; data; data center; magnetic tape data storage; Digital Data Storage; tape cassette; memory storage density; per bit cost; magnetic storage; erosion; erode; US National Institute of Standards and Technology (NIST); CD-R; optical disc; ptical media; life expectancy; lifetime; recordable CD; recordable DVD; areal density (computer storage); areal data density; data quality; manufacturing; manufacture; heat; humidity; light; Gnumeric; accelerated-aging; experiment; parameter; first principles (physics); mathematical model; physical system; material; activation energy; energy barrier; logic; logical; Arrhenius equation; Arrhenius law; chemical stability; physical model; Inkscape; data corruption; corrupted; thermal fluctuation; randomness; random; bit; negation; complement; probability; exponential function; equation; fall time; decay time; Boltzmann constant; thermodynamic temperature; absolute temperature; frequency; atom; atomic vibration frequency; hertz; Hz; error detection and correction; error correction code; electronvolt; eV; room temperature; scientist; The Netherlands; Germany; pattern; tungsten; silicon nitride; silicon; Si; nitrogen; N; image; imaging; interference (wave propagation); cathode ray; electron beam; archival WORM memory cell; computer memory; transparency; transparent; substrate (electronics); electron; photon; black swan theory; service life; lifetime; theft; meteor impact; sun; red giant phase; arXiv; DNA digital data storage; recording data chemically in DNA; base pair; chromosome; human genome; English language Wikipedia; chemical; test tube; polymerase chain reaction; copy; Harvard Medical School (Boston, Massachusetts); Massachusetts Institute of Technology (Cambridge, Massachusetts); Brandeis University (Waltham, Massachusetts); Skolkovo Innovation Center (Moscow, Russia; Utrecht University (Utrecht, the Netherlands); Tata Institute of Fundamental Research (Bengaluru, India); life; living; cell (biology); bacteria; bacterium; Halobacterium salinarum; scientific literature; paper; bioRxiv; organism; halophile; salt (chemistry); extremophile; extremophilic; archaeon; vacuum; ionizing radiation; desiccation; brine; crystal; stasis; salt mining; salt deposit; proof of the pudding; Petri dish; Nucleic acid sequence; DNA sequence; Creative Commons license; specification (technical standard); three-dimensional space; mineral; author; repository; mammal; mammalian; species; homo sapiens.