The human genome is the complete catalog of the genetic information carried by humans. The Human Genome Project began the process of systematically identifying and mapping the entire structure of human DNA in 1990. The first complete human genome was published in 2003, and work continues. The project identified more than 20,000 protein-coding genes scattered among the 23 chromosome pairs found in humans.
However, these genes represent only about 1.5 percent of the human genome. Several DNA sequence types have been identified, but many questions remain.
Protein-coding genes are DNA sequences that cells use to synthesize proteins. DNA consists of a long sugar-phosphate backbone, from which hang four smaller molecules called bases. The four bases are abbreviated as A, C, T and G.
The sequence of these four bases along the protein-coding portions of the DNA backbone corresponds to sequences of amino acids, the building blocks of proteins. The protein-coding genes specify proteins that determine the physical structure of humans and control our body chemistry.
Regulatory DNA Sequences
Different cells need different proteins at different times. For example, proteins needed by a brain cell might be very different than those needed by a liver cell. A cell must therefore be selective as to which proteins it needs to manufacture.
Regulatory DNA sequences combine with proteins and other factors to control which genes are active at any given time. They also serve as markers that identify the beginning and end of genes. Through biochemical processes and feedback mechanisms, the regulatory DNA sequences control gene expression.
Genes for Non-coding RNA
DNA does not make protein directly. RNA, a related molecule, serves as an intermediary. The DNA genes are first transcribed into messenger RNA, which then carries the genetic code to protein factory sites elsewhere in the cell.
DNA can also transcribe non-protein-coding RNA molecules, which the cell uses for a variety of functions. For example, DNA is the template for an important type of non-coding RNA used to build the protein factories found throughout the cell.
When a gene is transcribed into RNA, portions of the RNA might need to be removed because they contain unnecessary or confusing information. The DNA sequences that code for this unnecessary RNA are called introns. If the RNA created by introns in protein-coding genes were not spliced away, the resulting protein would be malformed or useless.
The process of RNA splicing is quite remarkable – the cell biochemistry must know of the intron’s existence, precisely locate its sequence on a strand of RNA and then excise it at exactly the right places.
Scientists do not know the function of a large percentage of the base sequences on a DNA molecule. Some could just be junk, while others might play roles not yet understood.
- Creatas Images/Creatas/Getty Images