Book a Demo
Book a Demo
Sentieon Genomics Tools: Boosting Performance with Memory Machine Cloud (abstract)

Sentieon Genomics Tools: Boosting Performance with Memory Machine Cloud (abstract)

Denny and Songjue 2023-09-01349

Summary

The Sentieon® Genomics tools provide a complete rewrite — stressing computational efficiency, accuracy, and consistency — of popular genomics pipelines (such as GATK, Picard, and MuTect) for calling variants from next-generation sequence data. Consistently, Sentieon Genomics tools show a five- to ten-fold improvement in performance over the equivalent open-source pipelines.

A common characteristic of variant calling pipelines is that the resource demands on CPU cores and memory access vary as the pipeline executes. MemVerge’s Memory Machine Cloud includes a checkpoint/restore feature (called WaveRider) that can automatically migrate a running container, without losing execution state, to a virtual machine of different capacity as soon as a change in resource requirements is detected.

When Sentieon Genomics tools are used in conjunction with WaveRider, performance (measured by wall clock time) improves even further because processes are never starved for resources. The costs associated with pipeline execution also decrease because virtual machines are never overprovisioned. Depending on how EC2 On-demand instance prices scale with the number of vCPUs and memory capacity, it is possible to optimize cost and performance simultaneously by running a larger virtual machine for a shorter period when CPU and/or memory utilization are high, and a smaller virtual machine for a longer period when CPU and/or memory utilization are low.

The Sentieon-developed whole genome sequencing (WGS) pipeline benchmark simulates a typical bioinformatic analysis pipeline, starting with loading raw data (in the form of fastq files) and ending with output in the form of vcf files. The complete report compares the performance of the WGS pipeline benchmark executed without using MMCloud to the performance when used with MMCloud.

The tests show that, compared to a baseline measured by running the WGS pipeline benchmark on a single AWS EC2 instance, the combination of Sentieon Genomics tools and Memory Machine Cloud (with WaveRider) demonstrates a 40% decrease in wall clock time and a 34% reduction in cost.

Comments