Sequence analysis

Sequence analysis in molecular biology involves identifying the sequence of nucleotides in a nucleic acid, or amino acids in a peptide or protein. Once a sample has been obtained, DNA sequences may be produced automatically by machine and the result displayed on computer. Interpreting those results is still a task for humans.

Information from sequence analysis is used in many fields of biology. It gives information on the relationship between individual organisms, or between groups of organisms. It shows how closely related they are.

DNA base-pair sequence

A DNA sequence is the sequence of nucleotides in a DNA molecule. It is written as a succession of letters representing the primary structure of a DNA molecule or strand. If functional, such a sequence carries information for the sequence of amino acids in a protein molecule. The possible letters are A, C, G, and T, representing the four nucleotide bases of a DNA strand — adenine, cytosine, guanine, thymine. The sequences are printed next to one another, without gaps, as in the sequence AAAGTCTGAC.

The study of RNA and proteins is more complex. The overall structure of DNA is simple and predictable (double helix). The study of RNA and proteins must include a study of their 3-dimensional structure, which is varied, and influences how they work. To some extent this can be assisted by computer, but has to be verified in each case.

Information on sequences is kept in databases. Since the development of fast production of gene and protein sequences during the 1990s, the rate of addition of new sequences to the databases increases all the time.

Score

Complete genome analysis has been done on over 800 species and strains. The work is done by a machine, the DNA sequencer, which analyses light signals from fluorochromes attached to the nucleotides. This type of work is gradually becoming less expensive.

"There are currently [2009] more than 90 vertebrate species with whole genome sequences finished, in process, or in the advanced planning stages.^[1]^[2]

Rough totals

As of December 2012, whole genome analysis has been completed on about 800 to 900 living species and strains of species. Numbers are approximate, and changing.^[3]

Animals: 111 species
Plants: 53 species
Fungi: 81 species
Protists: 50 species
Archaea: 139 species and strains
Bacteria: ~4/500 species and strains

Human DNA sequence

The human genome is stored on 23 chromosome pairs in the cell nucleus and in the small mitochondrial DNA. A great deal is now known about the sequences of DNA which are on our chromosomes. What the DNA actually does is now partly known. Applying this knowledge in practice has only just begun.

The Human Genome Project (HGP) produced a reference sequence which is used worldwide in biology and medicine. Nature published the publicly funded project's report,^[4] and Science published Celera's paper.^[5] These papers described how the draft sequence was produced, and gave an analysis of the sequence. Improved drafts were announced in 2003 and 2005, filling in to ≈92% of the sequence.^[6]

The latest project ENCODE studies the way the genes are controlled.^[7]^[8]

Forensic work

It is not necessary to have whole genome sequences for forensic work, such as identifying a criminal from traces of DNA left at a crime scene, or for paternity cases. At present whole genome sequencing is still very expensive, but fortunately, simpler and cheaper methods are available.

The basic idea is to look at certain loci (places) in the genome which are highly variable between people. About 10 to 15 of these loci are needed for a match, and the legal details differ between countries. A match between a sample and a suspect individual makes it extremely likely that the individual was the source of the sample. This evidence would then be the basis of the prosecution case for a crime. A similar analysis would show that a man was very likely the father of a child. This is really a modern way to do what was done with blood groups before DNA details could be analysed. The methods have been developed mainly by the work of Alec Jeffreys.

Each person’s DNA contains two alleles of a particular gene or 'marker': one from the father and one from the mother. 'Markers' are genes chosen for having a number of different alleles occurring frequently in the population. The following table is from a commercial DNA paternity testing experiment. It shows how relatedness between parents and child is demonstrated with five markers:

DNA Marker	Mother	Child	Alleged father
D21S11	28, 30	28, 31	29, 31
D7S820	9, 10	10, 11	11, 12
TH01	14, 15	14, 16	15, 16
D13S317	7, 8	7, 9	8, 9
D19S433	14, 16.2	14, 15	15, 17

The results show that the child and the alleged father’s DNA match for these five markers. The complete test results showed this correlation on 16 markers between the child and the tested man. If a case is tested in court, a forensic scientist would give evidence on the likelihood of getting that result by chance.

DNA testing in the US

There are state laws on DNA profiling in all 50 states of the United States.^[9] Detailed information on database laws in each state can be found at the National Conference of State Legislatures website.^[10]

Ancient DNA

Ancient DNA has been recovered from some sources. The record for survival of DNA suitable for sequence analysis is 700,000 years. A horse skeleton buried in permafrost has provided bones with some DNA surviving.^[11] The sequence was only 70% complete, but it was enough for researchers to say "It would not look like a horse as we know it… but we would expect it to be a one-toed horse". For comparison, researchers had access to DNA sequences of modern horses, donkeys and Przewalski's horse.

Related pages

George Church
Walter Gilbert
John Sulston
Fred Sanger
ENCODE: the complete analysis of the human genome
Human genome
Complete Genomics
Bioinformatics

References

↑ As listed by the International Sequencing Consortium [1] Archived 2012-02-08 at the Wayback Machine
↑ Austad, S (2009). "Comparative biology of aging". J Gerontol A Biol Sci Med Sci. 64 (2): 199–201. doi:10.1093/gerona/gln060. PMC 2655036. PMID 19223603.
↑ "Entrez Genome Database Search". National Center for Biotechnology Information. Search for details on specific genomes by organism name and strain.
↑ International Human Genome Sequencing Consortium (2001). "Initial sequencing and analysis of the human genome" (PDF). Nature. 409 (6822): 860–921. doi:10.1038/35057062. PMID 11237011.
↑ Venter J.C.; et al. (2001). "The sequence of the human genome" (PDF). Science. 291 (5507): 1304–1351. Bibcode:2001Sci...291.1304V. doi:10.1126/science.1058040. PMID 11181995.
↑ McElheny, Victor K. 2010. Drawing the map of life: inside the Human Genome Project. New York: Basic Books.
↑ Maher, Brendan 2012. ENCODE: The human encyclopaedia. Nature 489 (7414) 46–48. [2]
↑ Walsh, Fergus 2012. ENCODE: The human encyclopaedia. BBC News Sci & Environment. [3]
↑ "Genelex: The DNA Paternity Testing Site". Healthanddna.com. 1996-01-06. Archived from the original on 2010-12-29. Retrieved 2010-04-03.
↑ Donna Lyons — Posted by Glenda. "State Laws on DNA Data Banks". Ncsl.org. Archived from the original on 2011-09-29. Retrieved 2010-04-03.
↑ Ball, Jonathan 2013. Ancient horse bone yields oldest DNA sequence. BBC News Science & Environment. [4]

Other websites

Human Ageing Genomic Resources website [5]

[1] As listed by the International Sequencing Consortium [1] Archived 2012-02-08 at the Wayback Machine

[2] Austad, S (2009). "Comparative biology of aging". J Gerontol A Biol Sci Med Sci. 64 (2): 199–201. doi:10.1093/gerona/gln060. PMC 2655036. PMID 19223603.

[EntrezG-3] "Entrez Genome Database Search". National Center for Biotechnology Information. Search for details on specific genomes by organism name and strain.

[IHGSC-4] International Human Genome Sequencing Consortium (2001). "Initial sequencing and analysis of the human genome" (PDF). Nature. 409 (6822): 860–921. doi:10.1038/35057062. PMID 11237011.

[Venter-5] Venter J.C.; et al. (2001). "The sequence of the human genome" (PDF). Science. 291 (5507): 1304–1351. Bibcode:2001Sci...291.1304V. doi:10.1126/science.1058040. PMID 11181995.

[6] McElheny, Victor K. 2010. Drawing the map of life: inside the Human Genome Project. New York: Basic Books.

[Maher-7] Maher, Brendan 2012. ENCODE: The human encyclopaedia. Nature 489 (7414) 46–48. [2]

[BBC-8] Walsh, Fergus 2012. ENCODE: The human encyclopaedia. BBC News Sci & Environment. [3]

[9] "Genelex: The DNA Paternity Testing Site". Healthanddna.com. 1996-01-06. Archived from the original on 2010-12-29. Retrieved 2010-04-03.

[10] Donna Lyons — Posted by Glenda. "State Laws on DNA Data Banks". Ncsl.org. Archived from the original on 2011-09-29. Retrieved 2010-04-03.

[Ball-11] Ball, Jonathan 2013. Ancient horse bone yields oldest DNA sequence. BBC News Science & Environment. [4]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]