Genome Size and Complexity Help
Genome Size and Complexity
As the new age of molecular genetics progresses, yielding more genome nucleotide sequences, the field of comparative genomics will provide new and enlightening information regarding genome function and evolution. Currently, over 500 microbial genomes have been sequenced, and hundreds of eukaryote genomes, such as insect (Drosophila melanogaster,mosquito), parasite (Plasmodium),worm(Caenorhabditis elegans), plant (Arabidopsis, corn, rice, potato), and mammal (human, mouse) genomes either completed or in progress. Information from some completed genomes is presented in Table 13-1.
All of this sequence data provides some basic information about the general size and gene composition of the variety of organisms that exist on Earth. For example, microbial genomes range in size from 1 kb to 13 Mb; fungal genomes typically contain tens of megabases of DNA, with a few containing Gigabases (Gb); and insects have genomes that are hundreds of megabases.The largest sequenced insect genome belongs to the mosquito, Aedes aegypi at 800 Mb. Plant genomes display wide variation in size, ranging from tens to hundreds of megabases, with the largest sequenced plant genome being Zea mays (corn) at 2300 Mb. Mammalian genomes typically contain thousands of megabases of DNA, with the largest sequenced genome belonging to Macropus eugenii, the wallaby, at 3.8 Gb.
Thousands of mitochondrial and chloroplast genomes have also been sequenced. Most animal mitochondrial genomes range in size between 11-20 kb and contain 12 or 13 protein-coding genes. Fungal and plant mitochondrial genomes show greater size and gene variation, ranging from 8-160 kb and 10-40 protein-coding genes. Over 150 chloroplast genomes have been sequenced and they range in size between 35-200 kb and contain between 30-200 protein-coding genes.
More complex organisms may have mechanisms that allow for the production of more than one protein from a single gene. An example of this type of mechanism is alternative splicing. Alternative splicing results in different proteins based on how the introns are spliced, or removed from the transcribed RNA to produce the mature mRNA. (See Regulation of mRNA processing section, p. 375.) Introns may not be spliced or spliced differently to produce unique mature mRNAs that, in turn, cause the translation of correspondingly unique proteins. It is estimated that the approximately 20,000 human genes are responsible for producing upwards of 100,000 (or more!) proteins.
Additionally, it is becoming apparent that as organisms become more complex, there is a higher degree of gene duplication in their genes and genomes. For example, E. coli is estimated to contain 1345 duplicated genes, hile Drosophila contains 5536. Complexity in gene structure increases in terms of the number of domains present. A domain is a particular protein sequence element that generally is associated with a function (e.g., aDNA binding domain). The number of proteins in Drosophila that contain more than five domains (same or different) is around 100, while only 20 yeast proteins contain as many. Intron numbers also increase as organismal complexity increases; the yeast S. cerevisiae has a total of 220 introns, while Drosophila has 41,000.
Genes can be organized into families based on sequence similarity that generally translates to conservation in function. Several hundred gene families have been characterized, depending on the criteria. Families based on function such as transcription factors, DNA repair, protein kinases, transmembrane receptors, protein metabolism, etc., have been characterized in the fly and other organisms. Hundreds or thousands of genes may be placed in each gene family. For example, in the fly, nearly 4000 proteins have been categorized as cell growth and maintenance genes while only 57 are involved in DNA replication. Of the total number of genes identified from genome sequencing, anywhere from 30 to 50% do not have a hypothesized function. Function is generally assigned based on amino acid sequence similarity to proteins from other organisms whose function is known through previous genetic analysis (i.e., isolation of mutations and mutants).
Nearly 45% of eukaryotic genomes contains repetitive DNA elements. A small proportion of the human genome contains unique-sequence DNA that consists of sequences that are in single copy. These sequences typically include protein-coding genes. Other sequences exist in moderately repetitive copy number, such as the genes that encode rRNAs, tRNAs, and other elements known as SINEs (short interspersed elements) or LINEs (long interspersed elements). SINEs and LINEs are transposable elements (See Chapter 11).The human Alu sequence is an example of a SINE. It is approximately 280 bp in length and is present in over 300,000 copies. The Alu sequence is used in forensic DNA applications (see Chapter 12). LINEs can be several kilobases in length and may exist in up to 20,000 copies per genome. LINE-1 (L1) is a 6 kb element that is repeated several hundreds of thousands times per genome. A third type of repetitive DNA is highly repetitive, and these elements are typically very short (200 bp or less) and can exist in millions of copies per genome. Some examples of highly repetitive DNA are telomeric and centromeric sequences, but many of these sequences have no known function. Some genesmay even contain highly repetitive DNA sequences. A recently characterized spider silk gene (FLAG) is composed of 11 repeating exons interspersed with nearly identical repeating introns. Each exon is˜1320 bp and is made up of virtually the same sequence. By contrast, prokaryotes have fewer repeated DNA sequences. The repetitive DNA of eukaryotes is just part of the genome that is generally known as non-coding. One estimate suggests that as much as 95-98% of the human genome does not code for proteins. And, of the identified protein-coding genes, only approximately half have been assigned a biochemical function. Non-coding DNA may play a role in generating genomic diversity necessary to fuel genome evolution and the generation of individual phenotypic diversity. Transposable elements can move around the genome, creating mutations and chromosomal rearrangements that may lead to disease.
Organization of the Nuclear Genome
Functionally related bacterial genes are often clustered together in operons that produce polycistronic mRNAs. Eukaryotes have only monocistronic cytoplasmic mRNAs and their genes are not organized into operons. Many eukaryotic repeated genes that exist in multiple identical copies (e.g., genes for rRNAs, and tRNAs,) are clustered together on specific chromosomes, as components of multi-gene families. Other multiple-gene families may consist of a set of genes descended by duplication and mutation from one ancestral gene; they may be clustered together on the same chromosome or dispersed on different chromosomes. Such genes are usually coordinately controlled.
EXAMPLE 13.1 In humans, there are two families of hemoglobin genes. The alpha (α) family consists of a cluster of genes (including zeta [ζ], α2; and α1 on chromosome 16). The beta (β) family cluster on chromosome 11 includes epsilon (ε), gammas (γG; γA), delta (δ), and β. In addition, each family has one or more nonfunctional DNA sequences that are very similar to those of normal globin genes. These nonfunctional DNA gene-like sequences are referred to as pseudogenes. During the embryonic stage (less than 8 weeks) of development, the ζ- and ε-chains are synthesized. During the fetal period (8–41 weeks) the γ- and α-chains replace the embryonic chains. Beginning around birth and continuing for life, β-chains replace the gammas. A small fraction of adult hemoglobin has d-chains in place of b-chains. The signals that control this switching on or off of the various hemoglobin genes is not known. The similarity in nucleotide structure of all these genes, however, suggests that early in evolution (perhaps 800 million years ago) a single ancestral globin gene began a series of duplications, followed by mutations and transpositions, to produce the two families and their multiple constituent genes and pseudogenes that exist today.
Practice problems for these concepts can be found at:
Today on Education.com
- Kindergarten Sight Words List
- Signs Your Child Might Have Asperger's Syndrome
- Coats and Car Seats: A Lethal Combination?
- Child Development Theories
- GED Math Practice Test 1
- Graduation Inspiration: Top 10 Graduation Quotes
- The Homework Debate
- 10 Fun Activities for Children with Autism
- First Grade Sight Words List
- Social Cognitive Theory