Is the proteome the new genome?


Following the determination of the human genome in the 2000s, Katie Hughes examines the next step in molecular sequencing.

The human cell is a complex structure, at the core of which lies the nucleus. The nucleus contains chromosomal DNA, which is the genetic material of the cell. Full genome sequencing involves determining the complete sequence of the chromosomal DNA as well as the sequence of mitochondrial DNA and chromosomal DNA in plants.

The human genome has already been sequenced to an extent, which is a huge step in the area of molecular biology, considering that the original double helix was only discovered in 1953 by James D. Watson and Francis Crick. It is through the activity of the DNA that proteins are formed, though the complete human proteome has yet to be sequenced.

To understand the significance and complexity of sequencing a genome and the current attempts at sequencing a proteome, it is necessary to have a certain extent of knowledge about the sequences and structures that are being dealt with.

The human cell has an outer membrane, which encloses the fluid cytoplasm, in which all of the various parts of the cell are suspended. These include the energy-producing mitochondria, the protein-translating ribosomes and of course the DNA-containing nucleus. In the nucleus is found the genetic material, which consists of twenty-three pairs of chromosomes, giving a total of forty-six per cell. Each of these chromosomes is made of DNA, which contains the genetic information required for an organism to function.

The DNA (deoxyribonucleic acid) backbone of a chromosome strand consists of two complementary strands wound together in a double-helix formation. The backbone of each strand is comprised of a sugar (deoxyribose) and phosphate group, which are held together by phosphodiester bonds. Deoxyribose is a five-carbon sugar, which contains one less oxygen than its relative ribose, which is what gives it its name. The repetitive structures of sugar and phosphate make up the DNA’s backbone.

Attached to the other side of the sugar molecule on the backbone is a nucleic base. There are four different nucleic bases found in DNA – adenine (A), thymine (T), guanine (G) and cytosine (C), only one of which is attached to each sugar. However, when the two strands of the DNA molecule intertwine to form the double helix, an adenine molecule must be opposite a thymine molecule, as these bind together with three hydrogen bonds, and a guanine molecule must be facing cytosine molecule as this pairing forms three hydrogen bonds. Because of the necessity for the strands to be complementary to each other, both strands are used as templates for the formation of a new complementary strand during replication, which results in two identical chromosomes.

A gene is a series of thousands of base pairs in a row: every gene codes for the production of a specific protein, which leads to various outcomes in the body. Proteins are complex three-dimensional structures composed of amino acids. There are twenty different amino acids in nature; a series of three base pairs codes for the production of a particular amino acid. As there are twenty amino acids, each has more than one possible triplet sequence coding for it.

For a gene, or a sequence of bases, to be formed into a string of amino acids, a series of steps must take place. These steps can be divided into transcription and translation. In the first, a protein called RNA polymerase recognises a specific sequence of nucleic acids, called the promoter. The promoter is a length of bases before the gene sequence which signals for the RNA polymerase to bind to the chromosome. The RNA, along with other transcription factors, binds to the promoter and begins the process of transcription once it detects the start codon, AUG. The RNA polymerase stops its activity once it reaches a three base sequence called the stop codon which is any of TAA, TAG or TGA, located at the end of the gene.

Transcription is the process by which a copy of the DNA sequence is made into a RNA template. This template is transported out of the nucleus into the cytoplasm, where ribosomes convert the sequence to a chain of amino acids through a process called translation; in humans, this takes place in the endoplasmic reticulum. The proteins then undergo post-translational modification in the Golgi body of the cell, which may include, among others, ubiquitation, phosphorylation or glycosylation of the protein, which is the addition of other molecules to the amino acid sequence. The protein then joins with other proteins to form a holoprotein, as most proteins do not function alone, which is then transported to it’s final destination within the cell.

One of the main aims of the Human Genome Project was to sequence all of the 20,000 – 25,000 genes found on human chromosomes, some three billion base pairs in total. From the description above, it is easy to see that this was quite a mammoth task – it was by no means a simple undertaking but one that, now that it is complete, has a huge impact on mankind.

At the outset of this piece, it was stated that the entire human genome has been sequenced, but only to an extent. There are still small sections of the genome that have not been sequenced due to the gaps in question not responding to the sequencing techniques applied to already sequenced parts of the genome.

It is the sequence in which nucleic bases appear that the Human Genome Project is based on. The U.S. Department of Energy and the National Institutes of Health announced a primary draft of the first human genome in June 2000 and a more complete draft in 2003. In September 2010, researchers at UCD’s Conway Institute completed the sequencing of the first entire genome of an Irish individual. This project was carried out as every individual’s genome, with the exception of identical twins, is different.

This difference in each human’s DNA comes from conception – all cells but one in the human body contain double-stranded DNA; however, the female gametes, the egg cells, and the male gametes, the sperm, are both haploid, i.e. they contain singe stranded chromosomes only. Upon conception, when they fuse to form a zygote, they form two-stranded chromosomes. The DNA coming from the mother and the DNA coming from the father would both have been mutated in some way as the cells divided; so, because half of the new organism’s DNA comes from the mother and the other half from the father, each individual’s DNA is different.

Because of the individuality of the respective genomes, the first genome that was announced in 2003 was a representative one, formed from the compilation of more than one human genome. It would have matched relatively closely to most people’s personal gene sequences, though some variations would have occurred due to the variety from case to case.

Of the many benefits that have come out of the sequencing process, one of the most significant is the impact it has had in the area of molecular medicine. Researchers have been able to produce increasingly detailed genome maps, which allow them to locate and target genes associated with genetic conditions such as breast cancer, Alzheimer’s disease and myotonic dystrophy. This targeting allows them to focus on examining the fundamental causes of the disease in question rather than solely treating the symptoms.

The next big step in the study of molecular systems is the sequencing of the proteome – this would involve the sequencing of the entire complement of human proteins, including the modifications made to sets of proteins, which vary depending on the conditions present.

One of the basic problems with proteomics is the sheer complexity of proteins not only in their varying structure, but also in their expression. While the genome can be considered constant, despite mutations that may occur, proteins are produced due to a stimulus that is produced as a result of varying cell conditions. This means that the proteome in each cell of the body differs depending on the cell and the time. After the RNA strand is translated into a series of amino acids, the protein undergoes post-translational modifications. A protein can undergo any number of these, which further complicates its structure and attempts at sequencing it.

Proteomics gives a more thorough understanding of an organism than genomics, as genomics simply provides a fundamental template from which proteins are made. Proteomics however, examines all of the possible ramifications of the outcomes of transcription. The level at which a gene is transcribed is only a minor estimate of its level of expression as a protein. This is because in some cases a protein may become active only once it has undergone post-translational modification; a protein may be spliced after its production, giving two separate proteins or a section of the protein may degrade, giving a wholly different structure.

Examining proteins at a detailed level and determining their functions is of huge significance as knowledge of the DNA sequence of an organism is not sufficient when attempting to explain the way in which cells work or in examining the cause of disease. It is the function of proteins in every single cell of the body, from their actions in the phospholipid membrane and their functions as transport molecules to the need for their presence for the formation of yet a different protein, that determine our body’s response to different stimuli. They are one of the most important molecules in our body, which when we understand them completely, may give us a whole new lease on life.