New Software Analyzes Entire Genomes
Monday, February 2, 2009
Category: News > University > Research and Ideas
Using a method inspired by text comparison software, UC Berkeley scientists have developed a new technique for analyzing the genome.
The method takes the entire genome into account, unlike most current methods, which typically analyze only the small expressed portion.
Instead of only looking at the coding parts of the genome expressed into proteins, which make up 1 percent of an organism's DNA, scientists can now look at the whole genome with their computer code, using what they called the feature frequency profile method.
"If two genes are very similar to each other, you'd like to infer that they're closely related in an evolutionary scale," said Sung-Hou Kim, a UC Berkeley professor of chemistry who worked on the study. "Depending on which genes you picked, you may get different relationships. Instead of picking small number of genes to compare, we compared the whole genome sequence."
This process is similar to the way software programs detect plagiarism in writing-but with a twist. To simulate the process of analyzing the genome, scientists linked all the words together, then noted similar words between texts.
"We are treating the whole genome as a book. So we used the whole book as one continuous set of the alphabet to test our methods," Kim said.
For an effective comparison of texts, the length of the string of letters was key. By extending the DNA sequence, scientists were able to distinguish between different species.
The process by which scientists currently analyze the coding genes is called multiple sequence alignment.
"Looking at maybe a handful of genes, (the scientists) will align them so the base positions correspond with each other, and then they'll use that with which species are closely related with each other," said Gregory Sims, a computational scientist at the Lawrence Berkeley National Laboratory.
Scientists compared evolutionary trees constructed with their new method with trees made with the multiple sequence alignment process and found startling similarities. The two trees correspond to each other based on the current perception of the evolutionary relationship among mammals.
"We showed using this method that we get the same phylogenetic trait you get from the coding sequences," Sims said. "There's important evolutionary history you can pull out from introns, even though they're not expressed in protein."
Scientists believe this method can be used to quickly and comprehensively compare genomes, but it may take some time before scientists are willing to analyze the genome as a whole.
"It involves changing people's view about what is important," Sims said. "Currently, people think if it codes for something, it must be more important over using the entire sequence. We show there's evolutionary signal in noncoding parts of the genome, too."
Christine Chen covers research & ideas. Contact her at cchen@dailycal.org.
Comments (0) »
Comment PolicyThe Daily Cal encourages readers to voice their opinions respectfully in regards to both the readers and writers of The Daily Californian. Comments are not pre-moderated, but may be removed if deemed to be in violation of this policy. Comments should remain on topic, concerning the article or blog post to which they are connected. Brevity is encouraged. Posting under a pseudonym is discouraged, but permitted. Click here to read the full comment policy.













Printer Friendly
Comments (








