Conference Paper

Practical Space-efficient Linear Time Construction of FM-index for Large Genomes

2018 International Conference on Bioinformatics and Computational Biology
Elena Y. Harris

ABSTRACT


 The Burrows-Wheeler Transform (BWT) and Full-text

index in Minute space (FM-index) are indispensable data

structures that are used in the next generation sequencing

data analysis to efficiently map reads to a reference

genome. Recently developed algorithms SA-IS and BWTIS

allowed construction of a Suffix Array and Burrow-

Wheeler Transform, respectively, for mammalian-size

genomes in less than an hour. In practice, BWT-IS

algorithm outperforms SA-IS in terms of RAM usage.

Building an FM-index from a BWT requires LF-mapping

that is a relatively time-consuming step. Here, we present

a space-efficient linear time algorithm called BWT-ISFM

that builds an FM-index concurrently with a construction

of BWT. Our algorithm supports a genome size of up to

8Gb (giga-basepairs) in length while a publicly available

BWT-IS has a limit on a genome size of 4Gb. Moreover,

in practice, our algorithm requires only 2.3n  bytes of RAM

for a genome size of n  compared to 4n  bytes of RAM used

by SA-IS algorithms.

BICOB 2018



ISBN:
978-1-943436-11-8
PUBLISHER:
ISCA
CHIEF EDITOR:
Waheed
CONFERENCE VENUE:
Las Vegas, NV, USA
CONTACT DETAILS:
isca-hq.org
Copyright © Search Innovations. All rights reserved