Lessons from switching to on-disk storage using DelayedArray containers

Abstract

The DelayedArray container can store array-like data using different ‘backends’, including on-disk storage of data [1]. It provides a unified array-like interface to the data and can be used as an assay in a SummarizedExperiment. We have been amongst the first to adopt the DelayedArray framework, adding support to the bsseq and minfi packages used for analysing whole-genome DNA methylation data. This enabled the analysis of datasets that would otherwise have been prohibitively large (hundreds of GBs to TBs of data). We anticipate that the DelayedArray framework will be valuable in the analysis of ever-larger omics datasets, such as those generated from single cell technologies. It can also enable working with larger data on a laptop, especially one with a fast solid state drive.

This talk will highlight important lessons learnt during the process of re-factoring bsseq and minfi to take full advantage of the DelayedArray framework. We will give an overview of key concepts, highlight design patterns useful for writing performant code, and offer suggested workflows for common use cases.

[1]: DelayedArray Bioconductor package by Hervé Pagès (2018).

Date
Location
Victoria University, University of Toronto, Toronto, Canada
Links