Book a Demo
Book a Demo
Run Nextflow Workflows on MMCloud via Seqera Platform

Run Nextflow Workflows on MMCloud via Seqera Platform

Sateesh Peri 2024-01-021027

Run Nextflow Workflows on MMCloud via Seqera Platform

Nextflow and Seqera Platform: A Powerful Duo

  • Seqera Platform empowers users to launch Nextflow workflows across a variety of executors. With Nextflow's executor abstraction, your pipeline processes can be effortlessly transitioned between different environments - from local machines to clusters and cloud systems.

  • Seqera Platform has established itself as the go-to solution for deploying, monitoring, and troubleshooting Nextflow pipelines in hybrid environments that blend on-premise resources with cloud-based systems. The platform's versatility makes it a vital tool for today's diverse computational requirements.

Memory Machine Cloud (MMCloud): Optimizing Resource Management

MMCloud stands out with its advanced approach to resource optimization, particularly in CPU and memory management. This capability makes MMCloud an excellent choice for handling complex, memory-heavy tasks, typical of demanding computational workloads and data-intensive genomic applications. MMCloud's efficiency surpasses traditional cloud solutions, offering a more effective platform for resource-intensive operations.

Seamless Integration for Hybrid Environments

We are thrilled to present the community with the opportunity to deploy jobs on MMCloud using the float executor via the Seqera Platform. MMCloud presents a cost-effective and dependable cloud solution for running Nextflow pipelines, marking a significant advancement in cloud-based genomic data processing and analysis.

Pre-requisites:

  • AWS Batch compute environment setup on Seqera Platform
  • MMCloud license & OpCenter setup

Steps:

NOTE: The following steps will use AWS Batch to provision an on-demand instance for the Nextflow head node and subsequently apply the mmcloud float configuration to the Nextflow run command. This effectively changes the executor to float, thereby submitting the jobs to MMCloud for execution.

  1. Login to your Seqera Platform account.
  2. Create or use an existing AWS Batch compute environment that is configured to launch on-demand instances.
Seqera Platform AWS Batch Configuration
  1. From the Launchpad, add a pipeline to run or edit an existing one.
  2. Under Advanced Options, provide the additional MMCloud configuration as shown below to use MMCloud's float executor.
MMCloud Float Configuration
  • Change the configuration below to provide your workDir, float & aws credentials

MMCloud float configuration

plugins {
     id 'nf-float'
}

process {
    executor      = 'float'
    errorStrategy = 'retry'
}

float {
   address  = '10.102.34.687'
   username = 'user'
   password = 'pass'
}

aws {
   accessKey = 'access'
   secretKey = 'secret'
}
  1. Include the following bash command in the Pre-run script section. (This is to disable the AWS secrets provider which runs AWS Batch by default)
unset AWS_BATCH_JOB_ID
Pre-run Script Configuration
  1. Launch the workflow.

Enhanced Benefits of Deploying Nextflow Workflows on MMCloud

MMCloud, through its nf-float plugin, bridges Nextflow with the cloud, bringing a suite of advanced features and services that enhance the efficiency and flexibility of cloud-based workflow deployment.

WaveWatcher: Your Window to Real-Time Resource Management

WaveWatcher, a key component of MMCloud, provides real-time visibility into application resource usage. This service is essential for optimizing performance and reducing cloud costs by pinpointing and eliminating inefficiencies.

Highlights:

  • Live Resource Monitoring: Offers immediate insights into resource utilization, enabling timely adjustments.
  • Resource Optimization: Tailors resource allocation to specific task needs, maximizing efficiency.
  • Cost-Saving Insights: Identifies and helps eliminate wasteful resource usage.
  • Accessible Interface: Simplifies complex data, making it user-friendly for all skill levels.

WaveWatcher is more than a monitoring tool; it's a vital component for efficient resource management, ensuring that cloud resources are used optimally.

Wave-Watcher Graphs
Wave-Watcher Graphs
Wave-Watcher Graphs

SpotSurfer: Ensuring Continuity on Spot Instances

SpotSurfer introduces a robust checkpointing and recovery service for stateful applications running on AWS Spot EC2 instances, significantly enhancing their stability and reducing cloud costs by up to 90%.

Key Advantages:

  • Reliable Stateful Application Operation: Mitigates risks associated with Spot instance interruptions.
  • Significant Cost Reduction: Offers the economic benefits of Spot instances without compromising application stability.
  • Checkpointing for Safety: Periodically saves application states for quick recovery.
  • Optimized for Spot EC2: Tailored to exploit Spot instance benefits while safeguarding data integrity.
  • By default, the VM selection policy attached to all float jobs is SpotFirst.
Spot-Surfer Policy
  • To change the default VM policy for the entire workflow, add it to the float {} scope as below:

Launch all jobs on on-demand instances

float {
    address     = '10.102.34.687'
    username    = 'user'
    password    = 'pass'
    commonExtra = '--vmPolicy [onDemand=true]' 
}

Launch all jobs on spot instances only

float {
    address     = '10.102.34.687'
    username    = 'user'
    password    = 'pass'
    commonExtra = '--vmPolicy [spotOnly=true,retryLimit=3,retryInterval=200s]' 
}
  • To change the VM policy for a specific process within the workflow, add it to the process {} scope using the withName: selector as shown below:

Launch a specific job on on-demand instance

process {
    executor      = 'float'
    errorStrategy = 'retry'
    withName: "STARALIGN" {
          extra   ='  --vmPolicy [onDemand=true]'
        }
}

SpotSurfer redefines cloud efficiency, combining cost-effectiveness with operational reliability.

WaveRider: Intelligent Cloud Resource Right-Sizing

WaveRider's continuous right-sizing service dynamically balances cost and performance of cloud resources, ensuring that they align perfectly with application needs throughout runtime.

Core Features:

  • Auto-RightSizing: Dynamically adjusts resources to match real-time demands.
  • Cost Optimization: Continuously refines resource allocation to minimize unnecessary expenditure.
  • Performance Focus: Ensures that applications always have the resources they need to perform optimally.
  • Seamless Cloud Integration: Integrates effortlessly with existing cloud setups.

Default and Customizable migratePolicy for Flexible Management:

  • MMCloud’s default migratePolicy ensures balanced CPU and memory usage, with options to customize these settings for specific workflow or process requirements.
Default Migrate Policy
  • The default migratePolicy applied to all float jobs is as follows:

Default migratePolicy

--migratePolicy [cpu.upperBoundRatio=90,
                 cpu.lowerBoundRatio=5,
                 cpu.upperBoundDuration=120s,
                 cpu.lowerBoundDuration=300s,
                 cpu.limit=0,
                 cpu.lowerLimit=0,
                 mem.upperBoundRatio=90,
                 mem.lowerBoundRatio=5,
                 mem.upperBoundDuration=120s,
                 mem.lowerBoundDuration=300s,
                 mem.limit=0,
                 mem.lowerLimit=0,
                 stepAuto=true,
                 evadeOOM=true] 
  • To override the default migratePolicy for all processes within the workflow, apply them under the float {} scope:

Change the default migratePolicy for all jobs

float {
    address     = '10.102.34.687'
    username    = 'user'
    password    = 'pass'
    commonExtra = '--migratePolicy [cpu.upperBoundRatio=95,cpu.lowerBoundRatio=4,cpu.upperBoundDuration=150s,cpu.lowerBoundDuration=350s,cpu.limit=0,cpu.lowerLimit=0,mem.upperBoundRatio=75,mem.lowerBoundRatio=4,mem.upperBoundDuration=150s,mem.lowerBoundDuration=320s,mem.limit=0,mem.lowerLimit=0,stepAuto=true,evadeOOM=true]' 
}
  • To override the default migratePolicy for specific processes within the workflow, apply them under the process {} scope using the withName: selector as shown below:

Change the default migratePolicy for a specific job

process {
      executor      = 'float'
      errorStrategy = 'retry'
      withName: "STARALIGN" {
            extra   ='  --migratePolicy [cpu.upperBoundRatio=95,cpu.lowerBoundRatio=4,cpu.upperBoundDuration=150s,cpu.lowerBoundDuration=350s,cpu.limit=0,cpu.lowerLimit=0,mem.upperBoundRatio=75,mem.lowerBoundRatio=4,mem.upperBoundDuration=150s,mem.lowerBoundDuration=320s,mem.limit=0,mem.lowerLimit=0,stepAuto=true,evadeOOM=true]'
    }
}

WaveRider is essential for maximizing cloud resource efficiency, striking the right balance between cost and performance.

Together, these services enhance MMCloud's capability to host Nextflow workflows, providing a robust, efficient, and cost-effective cloud environment.

Current Limitations

  • The default file system used is S3FS, a utility that allows users to mount object storage buckets locally and read and write in a way that the users used to. It targets general use scenarios that are not sensitive to performance and network latency.

  • Users can enable use of Fusion FS in their float config as shown below. However, Checkpoint/restore feature required for Spot-Surfer / Wave-Rider is not available currently when using Fusion file system

wave {
 enabled = true
}

fusion {
   enabled                  = true
   exportStorageCredentials = true
   exportAwsAccessKeys      = true
}
  • MMCloud's JuiceFS implementation is also not available via this setup currently.
Comments