|
Life Music: The Sonification of Proteins
by John Dunn and Mary Anne Clark
John Dunn, Algorithmic Art. E-mail: johndunn@algoart.com. Web site: http://algoart.com
Mary Anne Clark, Department of Biology, Texas Wesleyan University, 1201 Wesleyan, Fort Worth, TX 76105. E-mail: macclark@startext.net
ABSTRACT
Artist John Dunn and biologist Mary Anne
Clark have collaborated on the sonification of protein data to produce the audio CD,
"Life Music." The authors describe the process by which this collaboration
merges scientific knowledge and artistic expression to produce soundscapes from these
basic building blocks of life, that may be encountered as esthetic experiences, as
scientific inquiry, or both. The rationale for both artistic use of the science and
scientific use of the art is described from the separate viewpoints of artist and
scientist.
Music and Proteins
I (Clark) love to walk into the music
building, which on my campus is next door to the science building. Through the doors of
the practice rooms, I can hear fragments of 1000 years of written music, played or sung by
the current generation of music students, some with finesse, some with hesitation, some
with wild improvisation. I think that if somehow I could walk into a living cell, I would
hear something similar the ribosomes ticking away at the synthesis of proteins,
playing out their amino acid sequences, note by note, according to a genetic score that is
reproduced sometimes with utter fidelity, sometimes with a few unscheduled substitutions,
and sometimes with stunningly inventive flourishes. Every generation of cells in every
living organism plays the genetic score of its species. However, while the history of
music as we know it goes back some 1000 years, the history of genetic music is at least
3.8 billion years in the making.
Over a decade ago, I went to a faculty
seminar to hear a colleague talk about composition. As he discussed how he went about
selecting, modifying and organizing musical themes, I was struck by the parallels between
musical structure and the structure of proteins and the genes that encode them. Proteins
also seemed to be composed of phrases organized into themes. For years I was haunted by
the image, and tried occasionally to interest musicians in making the transformation for
me converting a protein sequence into a musical sequence.
I was convinced that this would be worth
doing that the amino acid sequences would have the right balance of complexity and
patterning to generate musical combinations that are both aesthetically interesting and
biologically informative. There are twenty amino acids in proteins (listed in Table 1), enough for about three octaves of a diatonic scale. They are
not arranged at random, just as notes are not arranged at random in a piece of music. Both
proteins and music are meaningful. The meaning of a protein is its function in the
organism, and certain sequences have emerged as the hallmarks of specific functions.
For example, the protein hemoglobin serves
the function of oxygen binding. Some features of the hemoglobin tune can be seen by
examining the proteins of different species, which play this tune as variations on a
theme. Figure 1 represents the sequence of beta globin, which
forms half of the protein hemoglobin. For example, the tuatara, an exotic 3-eyed lizard,
would seem to have little in common with humans, but the similarities between the human
and tuatara beta globin sequence indicate that both proteins are variations on a theme
that was in existence before the divergence of the mammalian and reptilian lineages 200
million years ago. Other variations of beta globin can be found in vertebrate species from
all over the world, e.g. Australian ghost bats, Brazilian tapirs, Kenyan clawed frogs,
Antarctic dragon fish, and Emperor penguins. Although the beta-globin sequences are not
identical in these species, they are similar enough that, if converted to music, they
would be recognizable as variations on a common theme.
While it seemed obvious that proteins had
an inherently musical structure, I did not hear a musical translation of a protein until
1996. In the process of preparing for an honors course on structural similarities between
proteins and music, I did an Internet search looking for others who might also be
interested in these parallels. There were only a few, but on John Dunns algorithmic
music site, I found both music based on DNA and protein sequences and the software that
would make the musical translation. I purchased one of the software programs to use with
the class and discovered that proteins were even more musical than I had anticipated.
Nature As a Template for Art Music
An artist working in the medium of sound
is liberated from the cultural imperatives imposed by traditional music, but at a high
cost. Music in all cultures is rich in tradition and convention. Not only do listeners
expect to hear the musical references they have become familiar with, cultural and musical
tradition gives music its deep structure. This deep structure is not heard on the
conscious level by most listeners, but is an essential component of any musical work: the
component that keeps our interest fresh on repeated hearings. Popular music depends on
extra-musical cultural associations for this to a large degree, and so in a rapidly
evolving culture must be remade constantly. Classical concert and liturgical music depend
far more on multiple layers of abstraction within the music itself, with cultural
traditions of harmony and melody that evolve slowly. There are extra-music associations to
be sure, but the primary deep structure lies within the music itself; thus, classical
music stays fresh in our ears even over centuries.
Midway into the 20th century,
when electronics in general and the tape recorder in particular opened vast landscapes of
tonal colors and compositional layering to musical explorers, it quickly became apparent
that no one out side of the electronic music community was listening. Most people
considered "electronic music" to be an oxymoron. The problem was not that the
electronically generated tones were uninteresting. The problem was there existed no deep
structure to the music, either internally or culturally.
As an early experimenter with electronic
music, starting in the 60s with multiple tape decks and razor blade musique
concrŠte, it was called then I (Dunn) vividly remember my first hearing of Carlos
Switched on Bach [1], the first electronic music to receive
popular acclaim. I was driving and nearly ran the car off the road. This was astounding:
pure synthesized music that made no attempt whatsoever to mimic conventional
instrumentation, that stood on its own as music. Up to then, electronic music, even my own
especially my own was of interest only because it was electronic and experimental. It
had little to do with music as an esthetic experience and it rarely got a second hearing.
The fly in the ointment, of course, was
that the structure that gave Switched on Bach its meaning was a borrowed one, from
Bach, and our vast tradition of Western harmony with its abstract, slowly changing
cultural associations. In the end it was imitative music after all, barely hinting at the
new musical landscapes that had opened up to electronic composers. Morton Subotnick [2], arguably the best of the early composers of abstract synthesized
electronic music, with several landmark albums to his credit, remarked when asked what
kind of music he listened to, that he preferred Mozart, Bach, and the other traditional
Western composers. He pointed out that electronic music has no history, no tradition, and
thus for the present, little that can hold a listeners interest.
Early on I had determined that my path to
composing electronic music would eschew traditional composition, and treat this new medium
as a separate art form: sound as an artists medium, rather than music as a
traditionally trained musician would approach it. The reason for this, to me, was obvious.
The great investment of traditional music training has such weight that one cannot help
being stuck in that paradigm to some extent. Others have broken out of it Subotnick
comes to mind immediately but I wasnt that confident of my own ability. So I went
to art school to study sound as art, rather than traditional music, and it was there that
I discovered computers.
Digital computers have given electronic
musicians new tools for developing deep structure. The computers great strength is
in its use as a compositional tool for algorithmic music music that is developed with
computer processed rules which can combine together in tonal and structural relationships
that would be difficult if not impossible to calculate by traditional means. Joseph
Schillinger, who ironically died in 1943, the year the first electronic computer was
"born," developed much of the groundwork for algorithmic music in his series of
lectures that has been posthumously published as The Schillinger Theory of Musical
Composition [3]. His theory that all music, perhaps all
art, can be broken down to small whole number ratios is difficult to align with
traditional music composition techniques (although that is exactly what he attempts to
do), however it is a perfect fit for computer algorithmic composition.
While algorithmic processes have given
electronic art music a means of achieving deep structure, it is largely an alien structure
to our 20th century ears. And since this music is still very much in the
pioneer stage with the frontiers of its paradigms still shifting and ephemeral, the
listening audience for this kind of music remains negligible.
Thus, when botanist Dr. K.W. Bridges from
the University of Hawaii asked me in 1989 to look into sonification of some of his data on
tide tables, it occurred to me that, just as an artists approach rather than a
musicians helped loosen the bounds of tradition, perhaps substituting the structure
in scientific data for that of cultural tradition would help lend form to electronic music
that contemporary ears could appreciate.
While the tide table data failed to
resonate with any internal map I could discern, and the data were seemingly too random to
give the resulting music a sense of structure, deep or otherwise; it did lead to
discussions about what kind of scientific data might do this. Eventually the discussions
with Bridges led to DNA data and its associated protein sequences. It seemed to me that a
relatively simple alphabet of four tokens that form just twenty letters that in turn
combine to form the basis of all Earth life had to be rich with structure, and very likely
would resonate with the inner maps of us humans who are built upon this code.
This turned out to be the case. The
DNA/protein sequences have proven to posses deep and highly resonant structure, that
sounds both alien and familiar, like music from another culture: pleasantly unusual but
quite listenable. Our first public presentation of this music was in January, 1981, at the
University of Hawaii in a concert entitled, Inflections: Musical Interpretations of DNA
Data, which included music composed by myself and by Dr. Bridges, and related visuals
performed by artist Sonia Sheridan.
At the time I thought the DNA/protein
music would be a passing thing for me, a stepping stone on the exploratory search for
compositional structure and meaning to parallel the remarkable electronic and digital
tools technology has given us. But the well has not run dry. How could it? Natures
music of life is on a far vaster scale than any human (merely one of Her sonnets) could
possibly surpass. But She gives us a raw score so rich and harmonic it may well become the
fountainhead for future sonic artists, just as She has been for visual artists throughout
human history.
As a Research Fellow in the Arts at the
University of Michigan for the past two years, I have collaborated with Jamy Sheridan, a
visual algorithmic artist who has worked closely with me for several years on the
algorithmic art and music software I have developed, and with Dr. M.A. Clark, the
co-author of this article. The collaboration with Clark began some two years ago, when she
emailed me some technical questions regarding the software she purchased. Further email
correspondence revealed we were on similar trajectories regarding the sonification of
protein data, but with two separate sets of keys: hers based on science and mine on art.
Sonification of DNA and Proteins
DNA (deoxyribonucleic
acid)is a long multi-unit molecule containing Natures digital code for
life on Earth. There are just four coding elements: T, C, A and G. The letters stand for
the four different subunits of DNA (thymine, cytosine, adenine
and guanine) that form the "steps" on the helical ladder that is
the data base for all organisms. These four coding elements are combined into groups of
three, which are called codons. There are 64 possible codon combinations, of which 61 are
used to encode the 20 amino acids, plus three stop codons that indicate the end of a
protein sequence, as a period indicates the end of a sentence.
The twenty amino acids of which proteins
are composed [Table 1] differ from one another in size, solubility,
and electrical charge. Generally, water-insoluble amino acids like leucine, isoleucine and
valine cluster together in the interior of a protein, while more soluble amino acids are
exposed on the surface. Positively charged amino acids like lysine and arginine and
negatively charged amino acids like glutamic and aspartic acid may also attract each
other. These interactions encourage the protein to fold, like origami, into its functional
form, and the shape it assumes will depend on the position of each amino acid in the
sequence.
Just as a musical theme is defined by the
intervals from note to note, not by the absolute pitches of the notes, proteins are
defined more by their overall patterns than by their absolute sequences. In order to form
beta globin, the amino acids must line up in a way that allows the sequence to fold into a
molecule capable both of binding and of releasing oxygen with the appropriate
physiological parameters.
The amino acid interactions that stabilize
a particular folding pattern must be preserved, even if the specific amino acid sequence
is not, in order to preserve the function of a protein. The phrase (in amino acid letter
names) FSDGL in human beta globin and the phrase FGEAV in tuatara are different, but the
amino acids at the last four positions of each cluster have similar charge and solubility
characteristics. Such substitutions are said to be conservative, and act a little like a
musical key change, because they maintain the shape of the line even though the absolute
sequence is changed.
How Proteins are Encoded
Protein sequences and the organisms that
contain them have the look of being designed or composed. The design of an organism and
its molecular components emerges from the information stored in the DNA of its genes. The
relationship between DNA coding sequences and protein structure is something like the
relationship between Morse coding and plain text. Figure 2
demonstrates Morse code for the message "beta-globin." Some features of the two
coding systems are the following :
Morse Code uses combinations of two
elements, the dot and the dash, to specify letters of the alphabet and punctuation marks.
In genetic code, combinations of the four subunits A, T, C, and G are used to specify the
20 amino acids of the protein alphabet.
Morse code uses coding combinations of
various lengths, from a single dot (a short pulse) or dash (a longer pulse) to four
dots/dashes for the 26 letters of the English alphabet. Genetic code always uses
combinations of the same size three units. The DNA codons, e.g. AAA, CGA, CAT, specify
the 20 amino acids, the alphabet of protein structure. Transmitted Morse code uses a brief
period of silence to mark the boundaries between codons (e.g. to distinguish the letter
combination "et" from the letter "a" in the message "beta
globin"). Genetic code is read continuously, parsing the DNA data string into
triplets, and depends on the translating ribosomes to get the reading frame right.
Morse code begins with the first character
of the message and uses a stop codon (.-.-.-) to specify the end of the message. Genetic
code also begins with the first character of the message and ends with one of three stop
codons: TAG, TAA, or TGA. In both codes, the codons are laid out in the same sequence as
the letters of the message.
In Morse code, the relationship between
codons and the letters of the message is fully unambiguous: either can be predicted from
the other. V is only
- and
- is only V. However, genetic code is unambiguous
only when reading from the DNA to the protein. The reason that 61 DNA codons encode only
20 amino acids is that genetic coding is redundant. Most amino acids are represented by
two or more codons (see Table 1 for a codon listing); only two
amino acids are specified by a unique codon. Coding redundancy for several amino acids of
a single protein can be seen in Figure 3, which represents the DNA coding sequence and the
corresponding amino acid sequence for human beta globin. For example the amino acid lysine
(K) is represented by both of its two DNA codons, sometimes by AAA and sometimes by AAG,
and the amino acid glycine (G) is represented by three of its four possible codons
GGC, GGG and GGT.
These examples show that the sequence of a
protein is not a fixed structure, but a tentative one, like a melody in the mind of a
composer. The theme played by a protein in one of its guises may turn up again as a
variation or counter-theme in another part of the orchestra. In some cases, e.g.
sickle-cell hemoglobin, a single amino acid substitution can seriously reduce the
functionality of the protein. But sometimes a refolded tertiary structure develops new
talents. The sickle-cell mutation has the side effect of increasing resistance to malaria.
The normal beta globin is itself a variant of an earlier protein that also gave rise to
other globins. Other protein variants have acquired completely new functions, i.e. the
derivation of the milk protein lactalbumin from the protective enzyme lysozyme, and the
derivation of several eye lens crystallins from respiratory enzymes [4,
5].
The necessity for a working protein always
to have some meaning, some function, has made proteins change slowly enough when they do
change, that they have left the traces of their previous history behind in the record of
their amino acid sequences. Changes in protein sequences are generated by their
"composers," i.e. the DNA sequences that encode them. DNA produces new
variations both by making a change in the identity of a codon or by the wholesale
recombination of themes taken from different DNAs. With the development of computer
programs that can instruct digital musical instruments to play genetic scores, it has now
become possible to hear these protein songs.
Collaboration of Art and Science
When we began the protein music project,
we wanted to convey both something about primary amino acid sequence and something about
the folding patterns of proteins. Our goal was to create an audio CD album that would
stand on its own as art music, and at the same time offer empirical proof of the esthetic
patterning of natures deep structure. One way to approach this was to take advantage
of secondary structure of proteins: simple folding patterns that are combined to produce
the overall tertiary structure of a protein. There are three secondary patterns:
alpha-helix, beta-strand, and turns.
A protein chain is like a necklace, with
the chemical groups that identify each amino acid dangling from the chain like pendants.
These "pendants" are known as R-groups. Alpha-helix looks like the binding of a
"spiral" notebook, or a strand of string wound at even intervals down a pencil.
A helix is also like a spring in that you can stretch it along its long axis, and when
released, it will return to its original shape. In alpha-helix, the R-group
"pendants" project outward from the axis of the helix.
Beta-strands fold back and forth at the
carbon atom to which the R-groups are attached. In beta-strands, the R-groups project from
the folded chain on alternate sides. Beta strands from different parts of the sequence or
even from different sequences can line up with their R-groups in register. Adjacent
strands form weak bonds that connect them into beta sheets or cylindrical beta barrels.
Turns are just that: a region of the
molecule that goes off in a different direction than the one it came from. Turns may
connect two regions of alpha helix or beta strands to form alpha-turn-alpha or
beta-turn-beta complexes, or to connect alpha and beta regions.
As elements in the music that add to its
depth, the fact that these secondary structures exist in proteins, in addition to the
variation and theme of the protein sequences themselves, is enough to make rich and
interesting music. But to better understand how these simple patterns might contribute to
even deeper structure in the music, we looked to the extra insight offered by the
scientific study of more complex protein folding patterns.
Various combinations of secondary
structure form local domains in a proteins tertiary structure, or overall
architecture. As cathedrals can be classified as Romanesque, High Gothic, Perpendicular,
and so forth, protein architectures are grouped into different categories, some of which
are named simply for one of the proteins exhibiting the pattern, like "immunoglobulin
folds," while others are named more descriptively, like "trefoil
(cloverleaf)," "Greek key," or "beta sandwich."
The proteins we chose to work with in this
project were representative of four major pattern categories: fibrous, predominantly
alpha, predominantly beta, and mixed alpha-beta. To distinguish between alpha and beta
regions of these proteins, and to mark the turns, we decided to use changes in
instrumentation and/or pitch. For those proteins that have long regions in which one or
more motifs are tandemly reiterated, we chose instead to use different voicings to
differentiate between these motifs. What surprised us as we began to hear the sequences
was that some of the alpha and beta regions also were marked by motifs whose sequences
might not obviously repeat, but the general shaping of whose phrases did.
Discovering the Music in Proteins
In Dunns previous music programs
using DNA or protein sequences to generate music, pitches were assigned in two ways,
either absolutely by giving a fixed pitch to each amino acid or relatively by making a
frequency histogram of the amino acids in the protein and assigning more consonant
intervals to the more frequent amino acids. Because the properties of amino acids are
important in determining folding pattern, we decided to recognize those properties by
adding a third method for assigning pitch. We arranged the amino acids roughly according
to their water solubility. The most insoluble residues were assigned pitches in the lowest
octave, the most soluble, including the charged residues, were in the highest octave, and
the moderately insoluble residues were given the middle range. Pitches ranged over three
octaves in the diatonic scale, two octaves for a chromatic scale, and about four for
pentatonic and whole-tone scales.
Since solubility scales are set according
to various criteria, about which there is no real consensus, we also paid some attention
to issues of harmony, setting the pitches of amino acids with similar R groups at
consonant intervals. Setting the scale according to solubility produced an interesting
effect. As the linear sequence winds in and out of the interior of the protein, we hear
counter-melodies in the music: one in the lower register representing the interior
water-insoluble amino acids, and another in the upper register representing the more
soluble ones arranged at the protein-water interface: our linear sequences were playing
two and sometimes three parallel and slightly offset tunes.
We also discovered another feature of the
proteins: they had more than one personality. One of the earliest proteins that we set was
lysozyme C, and it was set three times, twice by Dunn, and once by Clark. This happened
more or less by accident as we each prepared for lectures that were given, along with
visual artist Jamy Sheridan, at the Ann Arbor Museum of Art in May, 1997. However, the
experience of listening to these parallel compositions, each developed independently in
two different locations (Clark in Texas, Dunn in Michigan), but with the same protein
data, and on the same sonification software, gave more insight into the astounding depth
of structure Nature has built into Her art. Each piece was different from the others, so
different that probably only someone very familiar with the lysozyme sequence would
recognize it as the basis of the three pieces. We asked ourselves how the same sequence
could assume these different characters.
One answer was relatively trivial: any
piece assumes a different character if its rhythm, tempo and instrumentation are changed,
just as the tune of "Amazing Grace" could function either as a march or as a
lullaby, depending on such factors. The protein tunes also vary depending on which of the
many pitch tables available to us are used. However, each of these variants is an
authentic voice of the protein, because of a critical feature of the proteins and nucleic
acids as informational molecules. For each, there are so many possible combinations of
tunes, it is often possible to specify a protein or DNA sequence uniquely by using fewer
than ten amino acids (or DNA codons) as the search pattern. This is not surprising, since
any given sequence pattern of 10 amino acid residues would occur at random with a
probability only of 1 / 1.024 x 1013. Indeed there are some combinations of 10
amino acids that do not appear in any protein now recorded in the data bases. However, for
a real protein, the pattern of pitch relationships produced by a given sequence will
belong only to that protein, regardless of the pitch table used. Listening to a given
proteins many voices is a way of inquiring into its nature, asking it to "Use
language we can comprehend" [6]. And so we interview each
sequence many times, hoping to ask it the question that will produce an answer meaningful
to us, in terms of our own musical experience.
Because of the fruitfulness of multiple
inquiry, we have continued to set individual proteins independently, as we did lysozyme,
with Dunn asking "Where is the art in your science?" and Clark asking,
"Where is the science in your art?" Our musical answers, and the software we
used to ask these questions, are available on the Internet sites given below. We invite
interested persons to add to the harmony with their own interpretations.
Resources
John Dunn, Algorithmic Arts. http://algoart.com
John Dunn, DNA Music. http://algoart.com/dnamusic
Dr. M. A. Clark, The Music Room. http://www.startext.net/homes/macclark/Music/musicpag.htm
Dr. Kent W. (Kim) Bridges. http://www.botany.hawaii.edu/faculty/bridges/
References
1. Wendy Carlos. Switched-On
Bach, 1968 CBS MK 63501. http://www.player.org/pub/u/wendy/
2. Morton Subotnick. http://newalbion.com/artists/subotnickm/
3. Joseph Schillinger, The
Schillinger System of Musical Composition, Vol. I & II (New York, Carl Fischer,
Inc., 1941).
4. Graeme Wistow and Joram
Piatigorsky, "Recruitment of enzymes as lens structural proteins," Science, Vol.
236, 1554-1556 (19 June 1987).
5. PROSITE. http://expasy.hcuge.ch/sprot/prosite.html
Accession # PDOC00119. Documentation for entry PS00128: Lactalbumin_lysozyme.
Accession # PDOC00793 . Documentation for entry PS01033: Globin.
6. Robert Frost, "Choose
Something like a Star," Steeple Bush, Henery Holt, Inc., New York, 1947.
7. IMB-Jena. Notations,
Properties and Images of the 20 Standard Amino Acids. http://www.imb-jena.de/IMAGE_AA.html
8. NIH. Table of Standard
Genetic Code. http://www.nih.gov/dcrt/expo/talks/cybersci/links/gencode.html
9. SWISS-PROT. http://expasy.hcuge.ch/sprot/sprot-top.html
Accession # P02023. Hemoglobin beta chain, Homo sapiens. Accession # P10061.
Hemoglobin beta-2 chain, Sphenodon punctatus.
10. OMIM (Online Mendelian Genetics
in Man). http://www3.ncbi.nlm.nih.gov/Omim/
Entry # 141900. Hemoglobin--beta locus; HBB.
Tables and Figures
Table 1: 20
Amino acids, their single-letter data-base codes (SLC), and their corresponding DNA codons
[Source: IMB-Jena (7) and NIH (8)]
|
Amino Acid |
SLC |
DNA codons |
Isoleucine
|
I |
ATT,
ATC, ATA |
Leucine |
L |
CTT, CTC, CTA, CTG, TTA, TTG |
Valine |
V |
GTT, GTC, GTA, GTG |
Phenylalanine |
F |
TTT, TTC |
Methionine |
M |
ATG |
Cysteine |
C |
TGT, TGC |
Alanine |
A |
GCT, GCC,
GCA, GCG |
Glycine |
G |
GGT, GGC, GGA, GGG |
Proline |
P |
CCT, CCC, CCA, CCG |
Threonine |
T |
ACT, ACC, ACA, ACG |
Serine |
S |
TCT, TCC, TCA, TCG, AGT, AGC |
Tyrosine |
T |
TAT, TAC |
Tryptophan |
W |
TGG |
Glutamine |
Q |
CAA, CAG |
Asparagine |
N |
AAT, AAC |
Histidine |
H |
CAT, CAC |
Glutamic acid |
E |
GAA, GAG |
Aspartic acid |
D |
GAT, GAC |
Lysine |
K |
AAA, AAG |
Arginine |
R |
CGT, CGC, CGA, CGG, AGA, AGG |
Stop codons |
Stop |
TAA, TAG, TGA |
In this table, the twenty amino acids found in proteins are listed, along
with the single-letter code used to represent these amino acids in protein data bases. The
DNA codons representing each amino acid are also listed. All 64 possible 3-letter
combinations of the DNA coding units T, C, A and G are used either to encode one of these
amino acids or as one of the three stop codons that signals the end of a sequence. While
DNA can be decoded unambiguously, it is not possible to predict a DNA sequence from its
protein sequence. Because most amino acids have multiple codons, a number of possible DNA
sequences might represent the same protein sequence. |
Figure 1: Beta globins: Comparison of human and tuatara sequences
[Source: SWISS-PROT (9) ]
|
Human: VHLTP EEKSA VTALW GKVNV DEVGG EALGR LLVVY PWTQR FFESF GDLST
Tuatara: VHWTA EEKQL VTSLW TKVNV DECGG EALGR LLIVY PWTQR FFSSF GNLSS
PDAVM GNPKV KAHGK KVLGA FSDGL AHLDN LKGTF ATLSE LHCDK LHVDP
STAIC GNPRV KAHGK KVFTS FGEAV KNLDN IKATY AKLSE LHCEK LHVDP
ENFRL LGNVL VCVLA HHFGK EFTPP VQAAY QKVVA GVANA LAHKY H
QNFNL LGDIF IIVLA AHFGK DFTPA CQAAW QKLVR VVAHA LAYHY H
|
The two sequences above are the single-letter data-base codes for the amino
acid sequence of the protein beta-globin in two species: human and tuatara (a primitive
lizard-like reptile). In each double row of letters, the human sequence is printed in bold
face and the tuatara sequence in standard face. The letters of the two sequences
have been separated into groups of 5 for ease of comparison. In a few of the groups, the
two sequences are identical, while in others there are one or more amino acid differences.
However, the similarity of the sequences of these distantly related species can be seen;
both are variations on a common theme. |
Figure 2: Morse code for the message
"beta-globin"
|
-... |
. |
- |
.- |
|
--. |
.-.. |
--- |
-... |
.. |
-. |
b |
e |
t |
a |
|
g |
l |
o |
b |
i |
n |
|
In this figure, the message "beta
globin" is spelled out in Morse code. Each Morse codon consists of one or more dots
and/or dashes. The individual codons are separated by a brief space (or silence in
transmitted code) that allows the code reader to identify the individual letters of a
message. Without this spacing,the combination "et" could not be distinguished
from the "a" that follows it. |
Figure
3: Genetic code for the protein beta-globin
[Source: OMIM (10) ]
|
[ATG] GTG CAC CTG
ACT CCT GAG GAG AAG TCT GCC GTT ACT GCC CTG TGG
[ M ] V H L
T P E E K S A
V T A L W
GGC AAG GTG AAC GTG GAT GAA
GTT GGT GGT GAG GCC CTG GGC AGG
G
K V N V D
E V G G E A
L G R
CTG CTG GTG GTC TAC CCT TGG
ACC CAG AGG TTC TTT GAG TCC TTT
L
L V V Y P W
T Q R F F E
S F
GGG GAT CTG TCC ACT CCT GAT
GCT GTT ATG GGC AAC CCT AAG GTG
G
D L S T P D
A V M G N P
K V
AAG GCT CAT GGC AAG AAA GTG
CTC GGT GCC TTT AGT GAT GGC CTG
K
A H G K K V
L G A F S D
G L
GCT CAC CTG GAC AAC CTC AAG
GGC ACC TTT GCC ACA CTG AGT GAG
A
H L D N L K G
T F A T L S
E
CTG CAC TGT GAC AAG CTG CAC
GTG GAT CCT GAG AAC TTC AGG CTC
L
H C D K L H
V D P E N F
R L
CTG GGC AAC GTG
CTG GTC TGT GTG CTG GCC CAT CAC TTT GGC AAA
L G N V L V
C V L A H H
F G K
GAA TTC ACC CCA
CCA GTG CAG GCT GCC TAT CAG AAA GTG GTG GCT
E F T P P V
Q A A Y Q K
V V A
GGT GTG GCT AAT
GCC CTG GCC CAC AAG TAT CAC TAA
G V A N A L
A H K Y H STOP |
The two sequences above represent the
amino acid sequence of the human protein beta-globin and the corresponding DNA sequence.
The groups of three letters above the line represent the DNA codons. Below the line are
the single-letter codes used for the twenty amino acids. Each amino acid is directly below
its DNA codon. Although in the example above, the individual codons are separated by a
space, the genetic code is read continuously, e.g. ATGGTGCACCTGACTCCTGAG, etc. In
beta-globin, the initial methionine (M) is removed from the final protein product. This
sequence demonstrates the redundancy of the genetic code, even for a single protein. A
given amino acid may be represented by any of several DNA codons. For example, lysine (K)
is represented by both of its codons (AAA and AAG) and glycine (G) by three (GGG, GGC,
GGT) of its four possible codons. The sequence also demonstrates how easily a variant can
be introduced into the sequence. Altering the codon GAG to GTG would replace the first
glutamic acid (E) of the sequence with valine (V). This single change produces the mutant
beta-globin of sickle-cell anemia. |
|