Central Dogma (Gene Expression): Definition, Steps, Regulation

The central dogma of molecular biology explains that the information flow for genes is from the DNA genetic code to an intermediate RNA copy and then to the proteins synthesized from the code. The key ideas underlying the dogma were first proposed by British molecular biologist Francis Crick in 1958.

By 1970 it became commonly accepted that RNA made copies of specific genes from the original DNA double helix and then formed the basis for the production of proteins from the copied code.

The process of copying genes via transcription of the genetic code and producing proteins through translation of the code into chains of amino acids is called gene expression. Depending on the cell and some environmental factors, certain genes are expressed while others remain dormant. Gene expression is governed by chemical signals between the cells and organs of living organisms.

The discovery of alternative splicing and the study of non-coding parts of DNA called introns indicate that the process described by the central dogma of biology is more complicated than was initially assumed. The simple DNA to RNA to protein sequence has branches and variations that help organisms adapt to a changing environment. The basic tenet that genetic information moves only in one direction, from DNA to RNA to proteins, remains unchallenged.

The information encoded in proteins can’t influence the original DNA code.

DNA Transcription Takes Place in the Nucleus

The DNA helix that encodes the organism’s genetic information is located in the nucleus of eukaryotic cells. Prokaryotic cells are cells that don't have a nucleus, so DNA transcription, translation and protein synthesis all take place in the cell's cytoplasm via a similar (but simpler) transcription/translation process.

In eukaryotic cells, DNA molecules can’t leave the nucleus, so cells have to copy the genetic code to synthesize proteins in the cell outside the nucleus. The transcription copying process is initiated by an enzyme called RNA polymerase and it has the following stages:

  1. Initiation. The RNA polymerase temporarily separates the two strands of the DNA helix. The two DNA helix strands stay attached on either side of the gene sequence being copied.
  2. Copying. The RNA polymerase travels along the DNA strands and makes a copy of a gene on one of the strands.

  3. Splicing. The DNA strands contain protein-coding sequences called exons, and sequences that are not used in protein production are called introns. Since the purpose of the transcription process is to produce RNA for the synthesis of proteins, the intron part of the genetic code is discarded using a splicing mechanism.

The DNA sequence copied in the second stage contains the exons and introns and is a precursor to messenger RNA.

To remove the introns, the pre-mRNA strand is cut at an intron/exon interface. The intron part of the strand forms a circular structure and leaves the strand, allowing the two exons from either side of the intron to join together. When removal of the introns is complete, the new mRNA strand is mature mRNA, and it is ready to leave the nucleus.

The mRNA Has a Copy of the Code for a Protein

Proteins are long strings of amino acids joined by peptide bonds. They are responsible for influencing what a cell looks like and what it does. They form cell structures and play a key part in metabolism. They act as enzymes and hormones and are embedded in cell membranes to facilitate the transition of large molecules.

The sequence of the string of amino acids for a protein is encoded in the DNA helix. The code is made up of the following four nitrogenous bases:

  • Guanine (G)
  • Cytosine (C)
  • Adenine (A)
  • Thymine (T)

These are nitrogenous bases, and each link in the DNA chain is made up of a base pair. Guanine forms a pair with cytosine, and adenine forms a pair with thymine. The links are given one-letter names depending on which base comes first in each link. The base pairs are called G, C, A and T for the guanine-cytosine, cytosine-guanine, adenine-thymine and thymine-adenine links.

Three base pairs represent a code for a particular amino acid and are called a codon. A typical codon might be called GGA or ATC. Because each of the three codon places for a base pair can have four different configurations, the total number of codons is 43 or 64.

There are about 20 amino acids that are used in protein synthesis, and there are also codons for start and stop signals. As a result, there are enough codons to define a sequence of amino acids for each protein with some redundancies.

The mRNA is a copy of the code for one protein.

Proteins Are Produced by Ribosomes

When the mRNA leaves the nucleus, it looks for a ribosome to synthesize the protein for which it has the coded instructions.

Ribosomes are the factories of the cell that produce the cell’s proteins. They are made up of a small part that reads the mRNA and a larger part that assembles the amino acids in the correct sequence. The ribosome is made up of ribosomal RNA and associated proteins.

Ribosomes are found either floating in the cell’s cytosol or attached to the cell’s endoplasmic reticulum (ER), a series of membrane-enclosed sacs found near the nucleus. When the floating ribosomes produce proteins, the proteins are released into the cell cytosol.

If the ribosomes attached to the ER produce a protein, the protein is sent outside the cell membrane to be used elsewhere. Cells that secrete hormones and enzymes usually have many ribosomes attached to the ER and produce proteins for external use.

The mRNA binds to a ribosome, and the translation of the code into the corresponding protein can begin.

Translation Assembles a Specific Protein According to the mRNA Code

Floating in the cell cytosol are amino acids and small RNA molecules called transfer RNA or tRNA. There is a tRNA molecule for each type of amino acid used for protein synthesis.

When the ribosome reads the mRNA code, it selects a tRNA molecule to transfer the corresponding amino acid to the ribosome. The tRNA brings a molecule of the specified amino acid to the ribosome, which attaches the molecule in the correct sequence to the amino acid chain.

The sequence of events is as follows:

  1. Initiation. One end of the mRNA molecule binds to the ribosome.
  2. Translation. The ribosome reads the first codon of the mRNA code and selects the corresponding amino acid from the tRNA. The ribosome then reads the second codon and attaches the second amino acid to the first one.
  3. Completion. The ribosome works its way down the mRNA chain and produces a corresponding protein chain at the same time. The protein chain is a sequence of amino acids with peptide bonds forming a polypeptide chain.

Some proteins are produced in batches while others are synthesized continuously to meet the ongoing needs of the cell. When the ribosome produces the protein, the information flow of the central dogma from DNA to protein is complete.

Alternative Splicing and the Effects of Introns

Alternatives to the direct information flow envisaged in the central dogma have recently been studied. In alternative splicing, the pre-mRNA is cut to remove introns, but the sequence of exons in the copied DNA string is changed.

This means that one DNA code sequence can give rise to two different proteins. While introns are discarded as non-coding genetic sequences, they may influence exon coding and may be a source of additional genes in certain circumstances.

While the central dogma of molecular biology remains valid as far as information flow is concerned, the details of exactly how the information flows from the DNA to the proteins is less linear than originally thought.