Genome Size and Complexity Help

based on 1 rating
By — McGraw-Hill Professional
Updated on Apr 25, 2014

Genome Size and Complexity

As the new age of molecular genetics progresses, yielding more genome nucleotide sequences, the field of comparative genomics will provide new and enlightening information regarding genome function and evolution. Currently, over 500 microbial genomes have been sequenced, and hundreds of eukaryote genomes, such as insect (Drosophila melanogaster,mosquito), parasite (Plasmodium),worm(Caenorhabditis elegans), plant (Arabidopsis, corn, rice, potato), and mammal (human, mouse) genomes either completed or in progress. Information from some completed genomes is presented in Table 13-1.

All of this sequence data provides some basic information about the general size and gene composition of the variety of organisms that exist on Earth. For example, microbial genomes range in size from 1 kb to 13 Mb; fungal genomes typically contain tens of megabases of DNA, with a few containing Gigabases (Gb); and insects have genomes that are hundreds of megabases.The largest sequenced insect genome belongs to the mosquito, Aedes aegypi at 800 Mb. Plant genomes display wide variation in size, ranging from tens to hundreds of megabases, with the largest sequenced plant genome being Zea mays (corn) at 2300 Mb. Mammalian genomes typically contain thousands of megabases of DNA, with the largest sequenced genome belonging to Macropus eugenii, the wallaby, at 3.8 Gb.

Thousands of mitochondrial and chloroplast genomes have also been sequenced. Most animal mitochondrial genomes range in size between 11-20 kb and contain 12 or 13 protein-coding genes. Fungal and plant mitochondrial genomes show greater size and gene variation, ranging from 8-160 kb and 10-40 protein-coding genes. Over 150 chloroplast genomes have been sequenced and they range in size between 35-200 kb and contain between 30-200 protein-coding genes.

More complex organisms may have mechanisms that allow for the production of more than one protein from a single gene. An example of this type of mechanism is alternative splicing. Alternative splicing results in different proteins based on how the introns are spliced, or removed from the transcribed RNA to produce the mature mRNA. (See Regulation of mRNA processing section, p. 375.) Introns may not be spliced or spliced differently to produce unique mature mRNAs that, in turn, cause the translation of correspondingly unique proteins. It is estimated that the approximately 20,000 human genes are responsible for producing upwards of 100,000 (or more!) proteins.

Additionally, it is becoming apparent that as organisms become more complex, there is a higher degree of gene duplication in their genes and genomes. For example, E. coli is estimated to contain 1345 duplicated genes, hile Drosophila contains 5536. Complexity in gene structure increases in terms of the number of domains present. A domain is a particular protein sequence element that generally is associated with a function (e.g., aDNA binding domain). The number of proteins in Drosophila that contain more than five domains (same or different) is around 100, while only 20 yeast proteins contain as many. Intron numbers also increase as organismal complexity increases; the yeast S. cerevisiae has a total of 220 introns, while Drosophila has 41,000.

Genes can be organized into families based on sequence similarity that generally translates to conservation in function. Several hundred gene families have been characterized, depending on the criteria. Families based on function such as transcription factors, DNA repair, protein kinases, transmembrane receptors, protein metabolism, etc., have been characterized in the fly and other organisms. Hundreds or thousands of genes may be placed in each gene family. For example, in the fly, nearly 4000 proteins have been categorized as cell growth and maintenance genes while only 57 are involved in DNA replication. Of the total number of genes identified from genome sequencing, anywhere from 30 to 50% do not have a hypothesized function. Function is generally assigned based on amino acid sequence similarity to proteins from other organisms whose function is known through previous genetic analysis (i.e., isolation of mutations and mutants).

Nearly 45% of eukaryotic genomes contains repetitive DNA elements. A small proportion of the human genome contains unique-sequence DNA that consists of sequences that are in single copy. These sequences typically include protein-coding genes. Other sequences exist in moderately repetitive copy number, such as the genes that encode rRNAs, tRNAs, and other elements known as SINEs (short interspersed elements) or LINEs (long interspersed elements). SINEs and LINEs are transposable elements (See Chapter 11).The human Alu sequence is an example of a SINE. It is approximately 280 bp in length and is present in over 300,000 copies. The Alu sequence is used in forensic DNA applications (see Chapter 12). LINEs can be several kilobases in length and may exist in up to 20,000 copies per genome. LINE-1 (L1) is a 6 kb element that is repeated several hundreds of thousands times per genome. A third type of repetitive DNA is highly repetitive, and these elements are typically very short (200 bp or less) and can exist in millions of copies per genome. Some examples of highly repetitive DNA are telomeric and centromeric sequences, but many of these sequences have no known function. Some genesmay even contain highly repetitive DNA sequences. A recently characterized spider silk gene (FLAG) is composed of 11 repeating exons interspersed with nearly identical repeating introns. Each exon is˜1320 bp and is made up of virtually the same sequence. By contrast, prokaryotes have fewer repeated DNA sequences. The repetitive DNA of eukaryotes is just part of the genome that is generally known as non-coding. One estimate suggests that as much as 95-98% of the human genome does not code for proteins. And, of the identified protein-coding genes, only approximately half have been assigned a biochemical function. Non-coding DNA may play a role in generating genomic diversity necessary to fuel genome evolution and the generation of individual phenotypic diversity. Transposable elements can move around the genome, creating mutations and chromosomal rearrangements that may lead to disease.

View Full Article
Add your own comment