Tikalon Header Blog Logo

DNA Extended Character Set

June 26, 2012

Gattaca, is a 1997 science fiction film, directed by Andrew Niccol.[1-2] It's also a memory aid, at least to scientists like me who are not geneticists. The combination of letters in the title stand for the four bases from which DNA is built; namely, adenine, cytosine, guanine, and thymine.

One enjoyable part of this film is the musical score by Michael Nyman, who composed the memorable piece, The Upside-Down Violin (1992),[3] a live performance of which I recorded from public radio many years ago. Getting the music you wanted was much harder in those days before the Internet. I recorded this on an audio cassette, so the sound quality wasn't ideal.

It's possible to use DNA as a digital recording medium. You can synthesize a DNA molecule with a selected combination of these bases to encode your data. Guanine is always paired with cytosine, and adenine is always paired with thymine, a fact discovered by Erwin Chargaff before Watson and Crick's discovery of the structure of DNA. Figure caption

They travel in pairs.

The structure of DNA, showing the G-C/A-T pairing.

(Chemical structure diagram by Madeleine Price Ball (modified), via Wikimedia Commons)

We can call one base pair a binary one, and the other, a binary zero. Human chromosomes have between 51 million to 245 million base pairs. If we consider 100 million base pairs, that would be about 10 megabyte, or just enough for a few family photographs. Such technology is useful for just a few limited purposes, one of which is DNA "fingerprinting" of important articles. Applied DNA Sciences, Inc., makes DNA tagants to discourage counterfeiting of many things, from integrated circuits to guitars.[4]

Martin Electric Guitar

A Martin Electric Guitar.

Martin is experimenting with DNA tagging.[4]

(Photo by Alex Harden, via Wikimedia Commons)

As any computer scientist knows, one way to expand your logical space is to shift from binary numbers (base-2) to ternary numbers (base-3); or even quaternary numbers (base-4). For ternary numbers, having three states increases maximal possible number for a fixed number of bits. Four binary bits will give you sixteen possible states, but four ternary bits have 81 possible states; viz.,
2 x 2 x 2 x 2 = 16
3 x 3 x 3 x 3 = 81
Might it be possible to create DNA with at least one additional base pair to allow ternary codes?

The first step in this direction was taken in 2008 by Floyd Romesberg and his colleagues at the Scripps Research Institute (La Jolla, California). After first trying 200 modifications of the natural base pairs with no success, they turned to combinatorial chemistry to generate 3600 pseudorandom candidates. Two of these, called dSICS and dMMO2, worked with a little molecular tweaking that encouraged pair bonding.[5-6]

The two novel bases are quite unlike the natural bases. Romesberg is quoted in New Scientist as saying,
"We got it and said, 'Wow!' It would have been very difficult to have designed that pair rationally... We now have an unnatural base pair that's efficiently replicated and doesn't need an unnatural polymerase... It's staring to behave like a real base pair."[5]
In 2009, Romesberg's group demonstrated transcription of their bases into RNA in vitro, and further studies have shown that the pair bonding of their artificial bases is quite different from the hydrogen bonding of the natural bases. Instead, the bases are held together by hydrophobic forces; that is, they cling to each other within the water molecules surrounding the DNA. Romberg speculates that the natural hydrogen bonding process could have been a random choice of several available to evolution.[7-8]

Not even the polymer backbone is sacrosanct. Another international team was able to store and recover genetic information from six xeno-nucleic acids (XNAs) not found in nature.[9-10] The four natural bases were part of these XNAs. It appears that many polymers can perform the same replication trick done by DNA.[9-10]


  1. Gattaca (1997, Andrew Niccol, Director) on the Internet Movie Database.
  2. Gattaca - Movie Trailer, YouTube video.
  3. Michael Nyman - The Upside Down Violin, YouTube video.
  4. Applied DNA Sciences News.
  5. Robert Adler, "Artificial letters added to life's alphabet," New Scientist, January 30, 2008.
  6. Aaron M. Leconte, Gil Tae Hwang, Shigeo Matsuda, Petr Capek, Yoshiyuki Hari and Floyd E. Romesberg, "Discovery, Characterization, and Optimization of an Unnatural Base Pair for Expansion of the Genetic Alphabet," J. Am. Chem. Soc., vol. 130, no. 7 (January 25, 2008), pp 2336-2343.
  7. Scripps Research Institute Study Suggests Expanding the Genetic Alphabet May Be Easier than Previously Thought, Scripps Research Institute Press Release, June 3, 2012.
  8. Karin Betz, Denis A Malyshev, Thomas Lavergne, Wolfram Welte, Kay Diederichs, Tammy J Dwyer, Phillip Ordoukhanian, Floyd E Romesberg and Andreas Marx, "KlenTaq polymerase replicates unnatural base pairs by inducing a Watson-Crick geometry," Nature Chemical Biology (advance online publication, June 3, 2012 .
  9. Helen Shen, "Enzymes grow artificial DNA - Synthetic strands with different backbones replicate and evolve just like the real thing," Nature, April 19, 2012.
  10. Vitor B. Pinheiro, Alexander I. Taylor, Christopher Cozens, Mikhail Abramov, Marleen Renders, Su Zhang, John C. Chaput, Jesper Wengel, Sew-Yeu Peak-Chew, Stephen H. McLaughlin, Piet Herdewijn and Philipp Holliger, "Synthetic Genetic Polymers Capable of Heredity and Evolution," Science, vol. 336 no. 6079 (April 20, 2012), pp. 341-344.

Permanent Link to this article

Linked Keywords: Gattaca; science fiction; DNA; adenine; cytosine; guanine; thymine; musical score; Michael Nyman; NPR; public radio; music; Internet; Compact Cassette; audio cassette; electronic media; digital recording medium; Chargaff's rules; Erwin Chargaff; James D. Watson; Francis Crick; Madeleine Price Ball; Wikimedia Commons; binary numeral system; binary; chromosome; megabyte; photograph; Applied DNA Sciences, Inc.; electronic article surveillance; tagant; counterfeiting; integrated circuit; guitar; C. F. Martin Company; electric guitar; computer scientist; base-2; ternary numbers; base-3; quaternary numbers; base-4; Floyd Romesberg; Scripps Research Institute; La Jolla, California; combinatorial chemistry; pseudorandom; molecular; transcription; RNA; in vitro; hydrogen bond; hydrogen bonding; hydrophobic; water; evolution; polymer; xeno-nucleic acids (XNAs); Internet Movie Database.