The GLP is committed to full transparency. Download and review our 2019 Annual Report

‘Google of sorts’: DNA database harnesses power of genome sequences

| | February 15, 2019
2-11-2019 complex disease inset
Image credit: Illumina
This article or excerpt is included in the GLP’s daily curated selection of ideologically diverse news, opinion and analysis of biotechnology innovation.

In 2015, scientists discovered a pig in China that would set off a frantic, worldwide search. The pig carried bacteria resistant to colistin, a drug used to cure infections when almost all other drugs have failed. …

In England, where colistin is reserved for patients in rare and dire circumstances, public-health officials worried. Could colistin-resistant bacteria also be lurking in that country?

[T]he search took 256 computers working together for an entire weekend, says Zamin Iqbal, a computational genomicist at the European Bioinformatics Institute … . The researchers there did find colistin resistance among their 24,000 samples, and eventually, countries all over the world found it, too.

Why did this process take so long? The computers at Public Health England had to open up and search the sequencing files of 24,000 genomes one by one.

Related article:  Selling your DNA in our 'brave new world'

So Iqbal decided to build a Google of sorts for bacterial and viral genomes. He and his colleagues downloaded all available genomes—nearly 500,000 at the time—from a public database called the European Nucleotide Archive. The 170,000-gigabyte data set took six whole weeks to download. … The resulting tool is called BIGSI, for BItsliced Genomic Signature Index.

Searching for colistin resistance through nearly 500,000 sequences now takes just a few seconds.

Read full, original post: The Problem With Big DNA

Share via
News on human & agricultural genetics and biotechnology delivered to your inbox.
Optional. Mail on special occasions.
Send this to a friend