Bioinformatics Help

By — McGraw-Hill Professional
Updated on Aug 23, 2011


Bioinformatics is a field that was born of the need for high-powered computing ability to help organize, analyze, and store biological information. The primary types of biological information involved in bioinformatics are DNA and protein sequence data. Once DNA sequencing became technologically simple and automated, massive numbers of gene sequences were generated. Public databases were created to house the information and allow everyone to use it. The definitive database in the United States for gene sequences is called GenBank® which is administered by the National Center for Biotechnology Information (NCBI) and, as of 2008, contains 85,759,586,764 nucleotide bases that are in 82,853,685 sequence records from thousands of different microbe, plant, and animal species. The database can be found at the NCBI website, There are additional databases of DNA sequences in Japan at the Data Bank of Japan (DDBJ) and in Europe at the European Molecular Biology Laboratory (EMBL). All of these databases are cooperative systems. The NCBI website contains a number of other useful databases and tools for analyzing genes, proteins and genomes.OMIM (Online Mendelian Inheritance in Man) is an authoritative source of information on human genes and diseases with links to gene sequence information and peer-reviewed references. Pub Med is a literature search engine for finding published articles on topics of human medicine and genetics.

Besides just storing biological information in the database, the database can be used to help analyze genes, their functions, and evolution. For example, if a gene is cloned and sequenced, this sequence can be used in a search called BLAST® against all known sequences (all 83 million … and growing) to determine if (1) it has already been cloned or (2) it is related to an already known gene. If it is a new gene sequence, its relatedness to other sequences might help determine its possible biological function. Protein databases can also be searched in similar ways.

EXAMPLE 12.11 Suppose you clone and sequence a DNA fragment obtained from a genomic library of Xenopus (frog) DNA. The sequence is entered into the BLAST® computer program at the NCBI web page and the top 10 matches obtained are shown below.

This data shown in the table provides several important pieces of information. First, it tells us the top 10 genes that our unknown DNA sequence is similar to in sequence. Next is information regarding the species to which each of these matching genes belongs (e.g., Rattus norvegicus). The score (based on an algorithm) is a relative measure of how similar the unknown, or query, sequence is to the identified sequence. The higher the number, the more similar the sequences.

Furthermore, the BLAST® program lines up the query sequence with each sequence in the database in an alignment (Fig. 12-17).


The alignment shows the exact nucleotides that are similar by connecting them with a line. Those that are different are not connected. These types of alignments can give an estimate of gene relatedness, which can be inferred to represent some degree of evolutionary relatedness. All of the sequences in the database are annotated, so further information can be found. For example, most sequence annotations contain references to the research article where in the information on the gene was published. This information can be accessed and a hypothesis regarding the function of the unknown gene, based on its similar relative (also known generally as a homolog(ue)) can be made and then tested. It turns out that the gene family analyzed in this example is involved in regulation of the cell division cycle

This example shows just one of the ways that bioinformatics can be used to help understand gene structure and function. Because DNA sequencing technology has advanced so rapidly, researchers are not just sequencing single genes but the genomes of entire organisms, ranging from bacteria and viruses to plants, insects, and humans. Most of this information is also being submitted to the public databases for use and analysis by scientists all over the world. For example, the U.S. Department of Energy has a robust genomics effort that encompasses the human genome project as well as the sequencing and analysis of microbial genomes (see gov). Some of the information is being used by biotechnology and pharmaceutical companies to help develop better cures and treatments for diseases.

Practice problems for these concepts can be found at:

Add your own comment