In 2015, scientists discovered a pig in China that would set off a frantic, worldwide search. The pig carried bacteria resistant to colistin, a drug used to cure infections when almost all other drugs have failed. …
In England, where colistin is reserved for patients in rare and dire circumstances, public-health officials worried. Could colistin-resistant bacteria also be lurking in that country?
…[T]he search took 256 computers working together for an entire weekend, says Zamin Iqbal, a computational genomicist at the European Bioinformatics Institute … . The researchers there did find colistin resistance among their 24,000 samples, and eventually, countries all over the world found it, too.
Why did this process take so long? The computers at Public Health England had to open up and search the sequencing files of 24,000 genomes one by one.
So Iqbal decided to build a Google of sorts for bacterial and viral genomes. He and his colleagues downloaded all available genomes—nearly 500,000 at the time—from a public database called the European Nucleotide Archive. The 170,000-gigabyte data set took six whole weeks to download. … The resulting tool is called BIGSI, for BItsliced Genomic Signature Index.
Searching for colistin resistance through nearly 500,000 sequences now takes just a few seconds.
Read full, original post: The Problem With Big DNA