Elements of the Genetic Code
Every time a cell divides, each daughter cell receives a full set of instructions that allows it to grow and divide. The instructions are contained within DNA. These long nucleic acid molecules are made of nucleotides linked end to end. Four kinds of nucleotides are commonly found in the DNA of all organisms. These are designated A, G, T, and C for the variable component of the nucleotide (adenine, guanine, thymine, and cytosine, respectively). The sequence of the nucleotides in the DNA chain provides the information necessary for manufacturing all the proteins required for survival, but information must be decoded.
DNA contains a variety of codes. For example, there are codes for identifying where to start and where to stop transcribing an RNA molecule. RNA molecules are nearly identical in structure to the single strands of DNA molecules. In RNA, the nucleotide uracil (U) is used in place of T and each nucleotide of RNA contains a ribose sugar rather than a deoxyribose sugar. RNA molecules are made using DNA as a template by a process called transcription. The resulting RNA molecule contains the same information as the DNA from which it was made, but in a complementary form. Some RNAs function directly in the structure and activity of cells, but most are used to produce proteins with the help of ribosomes, organelles within the cytoplasm of each cell. This latter type of RNA is known as messenger RNA (mRNA). The ribosome machinery scans the RNA nucleotide sequence to find signals to start the synthesis of polypeptides, the molecules of which proteins are made. When the start signals are found, the machinery reads the code in the RNA to convert it into a sequence of amino acids in the polypeptide, a process called translation. Translation stops at termination signals. The term “genetic code” is sometimes reserved for the rules for converting a sequence of nucleotides into a sequence of amino acids.
The Protein Genetic Code: General Characteristics
Experiments in the laboratories of Har Gobind Khorana, Heinrich Matthaei, Marshall Nirenberg, and others led to the deciphering of the protein genetic code. They knew that the code was more complicated than a simple one-to-one correspondence between nucleotides and amino acids, since there were about twenty different amino acids in proteins and only four nucleotides in RNA. They found that three adjacent nucleotides code for each amino acid. Since each of the three nucleotide positions can be occupied by any one of four different nucleotides, sixty-four different sets are possible. Each set of three nucleotides is called a codon. Each codon leads to the insertion of one kind of amino acid in the growing polypeptide chain.
Two of the twenty amino acids (tryptophan and methionine) have only a single codon. Nine amino acids are each represented by a pair of codons, differing only at the third position. Because of this difference, the third position in the codons for these amino acids is often called the wobble position. For six amino acids, any one of the four nucleotides occupies the wobble position. The three codons for isoleucine can be considered as belonging to this class, with the exception that AUG is reserved for methionine. Three amino acids (leucine, arginine, and serine) are unusual in that each can be specified by any one of six codons.
Punctuation
The protein genetic code is often said to be “commaless.” The bond connecting two codons cannot be distinguished from bonds connecting nucleotides within codons. There are no spaces or commas to identify which three nucleotides constitute a codon. As a result, the choice of which three nucleotides are to be read as the first codon during translation is very important. For example, if “EMA” is chosen as the first set of meaningful letters in the following string of letters, the result is gibberish:
TH EMA NHI TTH EBA TAN DTH EBA TBI THI M.
On the other hand, if “THE” is chosen as the first set of three letters, the message becomes clear:
THE MAN HIT THE BAT AND THE BAT BIT HIM.
The commaless nature of the code means that one sequence of nucleotides can be read three different ways, starting at the first, second, or third letter. Still, the genetic code does have “punctuation.” The beginning of each coding sequence has a start codon, which is always the AUG. Each coding sequence also has a stop codon, which acts like a period at the end of a sentence, denoting the end of the coding sequence.
These ways of reading are called reading frames.
A frame is said to be open if there are no stop codons for a reasonable distance. In most mRNAs, only one reading frame is open for any appreciable length. However, in some mRNAs, more than one reading frame is open. Some mRNAs can produce two, rarely three, different polypeptide sequences.
The Near Universality of the Code
The universal genetic code was discovered primarily through experiments with extracts from the bacterium
Escherichia coli
and from rabbit cells. Further work suggested that the code was the same in other organisms. It came to be known as the universal genetic code. The code was deciphered before scientists knew how to determine the sequence of nucleotides in DNA efficiently. After nucleotide sequences began to be determined, scientists could, using the universal genetic code, predict the sequence of amino acids. Comparison with the actual amino acid sequence revealed excellent overall agreement.
Nevertheless, the universal genetic code assignments of codons to amino acids had apparent exceptions. Some turned out to be caused by programmed changes in the mRNA information. In selected codons of some mRNA, a C is changed to a U. In others, an A is changed so that it acts like a G. Editing of mRNA does not change the code used by the ribosomal machinery, but it does mean that the use of DNA sequences to predict protein sequences has pitfalls.
Some exceptions to the universal genetic code are true variations in the code. For example, the UGA universal stop codon codes for tryptophan in some bacteria and in fungal, insect, and vertebrate mitochondrial DNA (mtDNA). Ciliated protozoans use UAA and UAG, reserved as stop codons in all other organisms, for the insertion of glutamine residues. Methionine, which has only one codon in the universal genetic code (AUG), is also encoded by AUA in vertebrate and insect mtDNA and in some, but not all, fungal mitochondria. Vertebrate mtDNA also uses the universal arginine codons AGA and AGG as stop codons. AGA and AGG are serine rather than arginine codons in insect mtDNA.
Interpreting the Code
How is the code interpreted? The mRNA codons organize small RNA molecules called transfer RNA (tRNA). There is at least one tRNA for each of the twenty amino acids. They are L-shaped molecules. At one end tRNAs have a set of three nucleotides (the anticodon) that can pair with the three nucleotides of the mRNA codon. They do not pair with codons for other amino acids. At the other end tRNAs have a site for the attachment of an amino acid.
Special enzymes called aminoacyl tRNA synthetases (RS enzymes) attach the correct amino acids to the correct tRNAs. There is one RS enzyme for each of the twenty amino acids. Interpretation is possible because each RS enzyme can bind only one kind of amino acid and only to tRNA that pairs with the codons for that amino acid. The key to this specificity is a special code in each tRNA located near where the amino acid gets attached. This code is sometimes referred to as the “second genetic code.” After binding the correct amino acid and tRNA, the RS enzyme attaches the two molecules with a covalent bond. These charged tRNAs, called aminoacyl-tRNAs, are ready to participate in protein synthesis directed by the codons of the mRNA. Information is stored in RNA in forms other than the triplet code. A special tRNA for methionine exists to initiate all peptide chains. It responds to AUG. However, proteins also have methionines in the main part of the polypeptide chain. Those methionines are carried by a different tRNA that also responds to AUG. The ribosome and associated factors must distinguish an initiating AUG from one for an internal methionine.
Distinction occurs differently in eukaryotes and bacteria. In bacteria, AUG serves as a start codon only if it is near a sequence that can pair with a section of the RNA in the ribosome. Two things are required of eukaryotic start (AUG) codons: First, they must be in a proper context of surrounding nucleotides; second, they must be the first AUG from the mRNA beginning that is in such a context. Context is also important for the incorporation of the unusual amino acid selenocysteine into several proteins. In a limited number of genes, a special UGA stop codon is used as a codon for selenocysteine. Sequences additional to UGA are needed for selenocysteine incorporation. Surrounding nucleotide residues also allow certain termination codons to be bypassed. For example, the mRNA from tobacco mosaic virus encodes two polypeptides, both starting at the same place; however, one is longer than the other. The extension is caused by the reading of a UAG stop codon by tRNA charged with tyrosine.
The production of two proteins with identical beginnings but different ends can also occur by frame shifting. In this mechanism, signals in the mRNA direct the ribosome machinery to advance or backtrack one nucleotide in its reading of the mRNA codons. Frame shifting occurs at a specific sequence in the RNA. Often the code for a frame shift includes a string of seven or more identical nucleotides and a complex RNA structure (a “pseudoknot”).
Further codes are embedded in DNA. The linear sequence of amino acids, derived from DNA, has a code for folding in three-dimensional space, a code for its delivery to the proper location, a code for its modification by the addition of other chemical groups, and a code for its degradation. The production of mRNA requires nucleotide codes for beginning RNA synthesis, for stopping its synthesis, and for stitching together codon-containing regions (exons) should these be separated by noncoding regions (introns). RNA also contains signals that can tag them for rapid degradation. DNA has a code recognized by protein complexes for the initiation of DNA replication and signals recognized by enzymes that catalyze DNA rearrangements.
Impact and Applications
A major consequence of the near universality of the genetic code is that biotechnologists can move genes from one species into another and have them still expressed correctly. Since the code is the same in both organisms, the same protein is produced. This has resulted in the large-scale production of specific proteins in bacteria, yeast, plants, and domestic animals. These proteins are of immense pharmaceutical, industrial, and research value.
Scientists developed rapid methods for sequencing nucleotides in DNA in the 1970s. Since the genetic code was known, it suddenly became easier to predict the amino acid sequence of a protein from the nucleotide sequence of its gene than it was to determine the amino acid sequence of the protein by chemical methods. The instant knowledge of the amino acid sequence of a particular protein greatly simplified predictions regarding protein function. This has resulted in the molecular understanding of many inherited human diseases and the potential development of rational therapies based on this new knowledge.
Key Terms
codon
:
a three-nucleotide unit of nucleic acids (DNA and RNA) that determines the amino acid sequence of the protein encoded by a gene
nucleotides
:
long nucleic acid molecules that form DNA and RNA, linked end to end; the sequences of these nucleotides in the DNA chain provides the genetic information
reading frame
:
the phasing of reading codons, determined by which base the first codon begins with; certain mutations can also change the reading frame
RNA
:
ribonucleic acid, a molecule similar to DNA but single-stranded and with a ribose rather than a deoxyribose sugar; RNA molecules are formed using DNA as a template and then use their complementary genetic information to conduct cellular processes or form proteins
transfer RNA (tRNA)
:
molecules that carry amino acids to messenger RNA (mRNA) codons, allowing amino acid polymerization into proteins
translation
:
the process of forming proteins according to instructions contained in an mRNA molecule
Bibliography
Clark, Brian F. C. The Genetic Code and Protein Biosynthesis. 2d ed. Baltimore: E. Arnold, 1984. Print.
Clark, David, and Lonnie Russell. Molecular Biology: Made Simple and Fun. 4th ed. St. Louis: Cache River, 2010. Print.
Judson, Horace Freeland. The Eighth Day of Creation: Makers of the Revolution in Biology. 1996. Commemorative ed. Cold Spring Harbor: CSHLP, 2013. Print.
Kay, Lily E. Who Wrote the Book of Life? A History of the Genetic Code. Stanford: Stanford UP, 2001. Print.
Leja, Darryl, National Human Genome Research Institute. "Digital Media Database: Genetic Code, RNA Codon Table."Genome.gov. NHGRI, 12 June 2010. Web. 25 July 2014.
Olby, Robert. Francis Crick: Hunter of Life’s Secrets. Cold Spring Harbor: CSHLP, 2009. Print.
Ribas de Pouplana, LluĂs, ed. The Genetic Code and the Origin of Life. New York: Kluwer, 2004. Print.
Ridley, Matt. Francis Crick: Discoverer of the Genetic Code. 2006. New York: Atlas, 2014. Digital file.
Trainor, Lynn E. H. The Triplet Genetic Code: The Key to Molecular Biology. River Edge: World Scientific, 2001. Print.
Tropp, Burton E., and David Freifelder. “Protein Synthesis: The Genetic Code.” Molecular Biology: Genes to Proteins. 3d ed. Sudbury: Jones, 2008. Print.
US Dept. of Energy Human Genome Project. Human Genome Project Information Archive 1990–2003. US DoE Human Genome Project, 21 Mar. 2014. Web. 25 July 2014.
US Dept. of Energy, Scripps Research Inst. Expanding the Eukaryotic Genetic Code. Washington, DC: US DoE, 2012. Digital file.
Watson, James D., Alexander Gann, and J. A. Witkowski. The Annotated and Illustrated Double Helix. New York: Simon, 2012. Print
Watson, James D., and Andrew Berry. DNA: The Secret of Life. 2003. New York: Knopf, 2013. Digital file.
.
No comments:
Post a Comment