Run Nextflow Workflows on MMCloud via Seqera Platform
Nextflow and Seqera Platform: A Powerful Duo
-
Seqera Platform empowers users to launch Nextflow workflows across a variety of executors. With Nextflow's executor abstraction, your pipeline processes can be effortlessly transitioned between different environments - from local machines to clusters and cloud systems.
-
Seqera Platform has established itself as the go-to solution for deploying, monitoring, and troubleshooting Nextflow pipelines in hybrid environments that blend on-premise resources with cloud-based systems. The platform's versatility makes it a vital tool for today's diverse computational requirements.
Memory Machine Cloud (MMCloud): Optimizing Resource Management
MMCloud stands out with its advanced approach to resource optimization, particularly in CPU and memory management. This capability makes MMCloud an excellent choice for handling complex, memory-heavy tasks, typical of demanding computational workloads and data-intensive genomic applications. MMCloud's efficiency surpasses traditional cloud solutions, offering a more effective platform for resource-intensive operations.
Seamless Integration for Hybrid Environments
We are thrilled to present the community with the opportunity to deploy jobs on MMCloud using the float
executor via the Seqera Platform. MMCloud presents a cost-effective and dependable cloud solution for running Nextflow pipelines, marking a significant advancement in cloud-based genomic data processing and analysis.
Pre-requisites:
- AWS Batch compute environment setup on Seqera Platform
- MMCloud license & OpCenter setup
Steps:
NOTE: The following steps will use AWS Batch to provision an on-demand instance for the Nextflow head node and subsequently apply the
mmcloud float configuration
to the Nextflow run command. This effectively changes the executor tofloat
, thereby submitting the jobs to MMCloud for execution.
- Login to your Seqera Platform account.
- Create or use an existing AWS Batch compute environment that is configured to launch
on-demand
instances.
- From the
Launchpad
, add a pipeline to run or edit an existing one. - Under
Advanced Options
, provide the additional MMCloud configuration as shown below to use MMCloud'sfloat
executor.
- Change the configuration below to provide your
workDir
,float
&aws
credentials
MMCloud float configuration
plugins {
id 'nf-float'
}
process {
executor = 'float'
errorStrategy = 'retry'
}
float {
address = '10.102.34.687'
username = 'user'
password = 'pass'
}
aws {
accessKey = 'access'
secretKey = 'secret'
}
- Include the following bash command in the Pre-run script section. (This is to disable the AWS secrets provider which runs AWS Batch by default)
unset AWS_BATCH_JOB_ID
- Launch the workflow.
Enhanced Benefits of Deploying Nextflow Workflows on MMCloud
MMCloud, through its nf-float plugin, bridges Nextflow with the cloud, bringing a suite of advanced features and services that enhance the efficiency and flexibility of cloud-based workflow deployment.
WaveWatcher: Your Window to Real-Time Resource Management
WaveWatcher, a key component of MMCloud, provides real-time visibility into application resource usage. This service is essential for optimizing performance and reducing cloud costs by pinpointing and eliminating inefficiencies.
Highlights:
- Live Resource Monitoring: Offers immediate insights into resource utilization, enabling timely adjustments.
- Resource Optimization: Tailors resource allocation to specific task needs, maximizing efficiency.
- Cost-Saving Insights: Identifies and helps eliminate wasteful resource usage.
- Accessible Interface: Simplifies complex data, making it user-friendly for all skill levels.
WaveWatcher is more than a monitoring tool; it's a vital component for efficient resource management, ensuring that cloud resources are used optimally.
SpotSurfer: Ensuring Continuity on Spot Instances
SpotSurfer introduces a robust checkpointing and recovery service for stateful applications running on AWS Spot EC2 instances, significantly enhancing their stability and reducing cloud costs by up to 90%.
Key Advantages:
- Reliable Stateful Application Operation: Mitigates risks associated with Spot instance interruptions.
- Significant Cost Reduction: Offers the economic benefits of Spot instances without compromising application stability.
- Checkpointing for Safety: Periodically saves application states for quick recovery.
- Optimized for Spot EC2: Tailored to exploit Spot instance benefits while safeguarding data integrity.
- By default, the VM selection policy attached to all float jobs is
SpotFirst
.
- To change the default VM policy for the entire workflow, add it to the
float {}
scope as below:
Launch all jobs on on-demand instances
float {
address = '10.102.34.687'
username = 'user'
password = 'pass'
commonExtra = '--vmPolicy [onDemand=true]'
}
Launch all jobs on spot instances only
float {
address = '10.102.34.687'
username = 'user'
password = 'pass'
commonExtra = '--vmPolicy [spotOnly=true,retryLimit=3,retryInterval=200s]'
}
- To change the VM policy for a specific process within the workflow, add it to the
process {}
scope using thewithName:
selector as shown below:
Launch a specific job on on-demand instance
process {
executor = 'float'
errorStrategy = 'retry'
withName: "STARALIGN" {
extra =' --vmPolicy [onDemand=true]'
}
}
SpotSurfer redefines cloud efficiency, combining cost-effectiveness with operational reliability.
WaveRider: Intelligent Cloud Resource Right-Sizing
WaveRider's continuous right-sizing service dynamically balances cost and performance of cloud resources, ensuring that they align perfectly with application needs throughout runtime.
Core Features:
- Auto-RightSizing: Dynamically adjusts resources to match real-time demands.
- Cost Optimization: Continuously refines resource allocation to minimize unnecessary expenditure.
- Performance Focus: Ensures that applications always have the resources they need to perform optimally.
- Seamless Cloud Integration: Integrates effortlessly with existing cloud setups.
Default and Customizable migratePolicy
for Flexible Management:
- MMCloud’s default
migratePolicy
ensures balanced CPU and memory usage, with options to customize these settings for specific workflow or process requirements.
- The default
migratePolicy
applied to all float jobs is as follows:
Default migratePolicy
--migratePolicy [cpu.upperBoundRatio=90,
cpu.lowerBoundRatio=5,
cpu.upperBoundDuration=120s,
cpu.lowerBoundDuration=300s,
cpu.limit=0,
cpu.lowerLimit=0,
mem.upperBoundRatio=90,
mem.lowerBoundRatio=5,
mem.upperBoundDuration=120s,
mem.lowerBoundDuration=300s,
mem.limit=0,
mem.lowerLimit=0,
stepAuto=true,
evadeOOM=true]
- To override the default
migratePolicy
for all processes within the workflow, apply them under thefloat {}
scope:
Change the default migratePolicy for all jobs
float {
address = '10.102.34.687'
username = 'user'
password = 'pass'
commonExtra = '--migratePolicy [cpu.upperBoundRatio=95,cpu.lowerBoundRatio=4,cpu.upperBoundDuration=150s,cpu.lowerBoundDuration=350s,cpu.limit=0,cpu.lowerLimit=0,mem.upperBoundRatio=75,mem.lowerBoundRatio=4,mem.upperBoundDuration=150s,mem.lowerBoundDuration=320s,mem.limit=0,mem.lowerLimit=0,stepAuto=true,evadeOOM=true]'
}
- To override the default
migratePolicy
for specific processes within the workflow, apply them under theprocess {}
scope using thewithName:
selector as shown below:
Change the default migratePolicy for a specific job
process {
executor = 'float'
errorStrategy = 'retry'
withName: "STARALIGN" {
extra =' --migratePolicy [cpu.upperBoundRatio=95,cpu.lowerBoundRatio=4,cpu.upperBoundDuration=150s,cpu.lowerBoundDuration=350s,cpu.limit=0,cpu.lowerLimit=0,mem.upperBoundRatio=75,mem.lowerBoundRatio=4,mem.upperBoundDuration=150s,mem.lowerBoundDuration=320s,mem.limit=0,mem.lowerLimit=0,stepAuto=true,evadeOOM=true]'
}
}
WaveRider is essential for maximizing cloud resource efficiency, striking the right balance between cost and performance.
Together, these services enhance MMCloud's capability to host Nextflow workflows, providing a robust, efficient, and cost-effective cloud environment.
Current Limitations
-
The default file system used is S3FS, a utility that allows users to mount object storage buckets locally and read and write in a way that the users used to. It targets general use scenarios that are not sensitive to performance and network latency.
-
Users can enable use of Fusion FS in their float config as shown below. However, Checkpoint/restore feature required for Spot-Surfer / Wave-Rider is not available currently when using Fusion file system
wave { enabled = true } fusion { enabled = true exportStorageCredentials = true exportAwsAccessKeys = true }
- MMCloud's JuiceFS implementation is also not available via this setup currently.