Book a Demo
Book a Demo

Run Cromwell (AWS)

Ashley Tung

Instance Setup

  1. Launch EC2 Instance. From the EC2 console, click on Launch Instance

  2. For AMI, search for amzn2-ami-kernel-5.10-hvm-2.0.20230628.0-x86_64-gp2

  3. For Instance type, choose t2.medium and provide key-pair

  4. For Network Setting select existing default security-group

  5. For Configure Storage, 8Gb of root volume is sufficient. Proceed to click on Launch Instance

MMC CLI Setup

Assuming your OpCenter is setup, to install the CLI:

wget https://<op_center_ip_address>/float --no-check-certificate
sudo mv float /usr/local/bin/
sudo chmod +x /usr/local/bin/float

Connect to your opcenter by loggin in:

float login -a <op_center_ip_address> -u <username> -p <password>

Cromwell Setup

Install Java

curl -s "https://get.sdkman.io" | bash
source "/home/ec2-user/.sdkman/bin/sdkman-init.sh"
sdk install java 17.0.6-tem
java -version

Install Cromwell

wget https://github.com/broadinstitute/cromwell/releases/download/84/cromwell-84.jar

# Check version
java -jar cromwell-84.jar --version

Config file

Name your file cromwell-float.conf. Make sure you update the address to your OpCenter address

# This is an example of how you can use Cromwell to interact with float.

backend {
  default = float

  providers {
    float {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        runtime-attributes="""
                String f_cpu = "2"
                String f_memory = "4"
                String f_docker = ""
                String f_extra = ""
        """

        # If an 'exit-code-timeout-seconds' value is specified:
        # - check-alive will be run at this interval for every job
        # - if a job is found to be not alive, and no RC file appears after this interval
        # - Then it will be marked as Failed.
        # Warning: If set, Cromwell will run 'check-alive' for every job at this interval
        exit-code-timeout-seconds = 30

        submit = """
            mkdir -p ${cwd}/execution
            echo "set -e" > ${cwd}/execution/float-script.sh
            echo "cd ${cwd}/execution" >> ${cwd}/execution/float-script.sh
            tail -n +22 ${script} > ${cwd}/execution/no-header.sh
            head -n $(($(wc -l < ${cwd}/execution/no-header.sh) - 14)) ${cwd}/execution/no-header.sh >> ${cwd}/execution/float-script.sh

            float submit -i ${f_docker} -j ${cwd}/execution/float-script.sh --cpu ${f_cpu} --mem ${f_memory} ${f_extra} >  ${cwd}/execution/sbatch.out 2>&1
            cat ${cwd}/execution/sbatch.out | sed -n 's/id: \(.*\)/\1/p' > ${cwd}/execution/job_id.txt
            echo "receive float job id: "
            cat ${cwd}/execution/job_id.txt

            JOB_SCRIPT_DIR=float-jobs/$(cat ${cwd}/execution/job_id.txt)
            mkdir -p $JOB_SCRIPT_DIR
            cd $JOB_SCRIPT_DIR
            
# create the check alive script
cat <<EOF > float-check-alive.sh
SCRIPT_DIR=$(pwd)
cd ${cwd}/execution
float show -j \$1 --runningOnly > job-status.yaml
if [[ -s job-status.yaml ]]; then
    cat job-status.yaml
else
    float show -j \$1 | grep rc: | tr -cd '[:digit:]' > rc
    float log cat -j \$1 stdout.autosave > stdout
    float log cat -j \$1 stderr.autosave > stderr
fi
cd $SCRIPT_DIR
EOF

# create the kill script
cat <<EOF > float-kill.sh
SCRIPT_DIR=$(pwd)
cd ${cwd}/execution
float scancel -f -j \$1
cd $SCRIPT_DIR
EOF

            cat ${cwd}/execution/sbatch.out
        """

        kill = """
            source float-jobs/${job_id}/float-kill.sh ${job_id}
        """

        check-alive = """
            source float-jobs/${job_id}/float-check-alive.sh ${job_id}
        """
        
        job-id-regex = "id: (\\w+)\\n"
      }
    }
  }
}

S3 Bucket Setup

Follow the directions in this section: Setup Nextflow host on AWS - HackMD

Once you made your bucket and created your access keys, install s3fs:

sudo yum install automake fuse fuse-devel gcc-c++ git libcurl-devel libxml2-devel make openssl-devel
git clone https://github.com/s3fs-fuse/s3fs-fuse.git
cd s3fs-fuse
./autogen.sh
./configure --prefix=/usr --with-openssl
make
sudo make install

Create a password file with your access key and secret key in the form of

access_key:secret_key

Change mode your file to 600

chmod 600 ./passwd-s3fs

Mount your bucket to your designated mountpoint. If you are mounting to a directory that requires root privileges to access, you will need to use sudo to mount

s3fs BUCKET /MOUNTPOINT -o rw,allow_other -o multipart_size=52 -o parallel_count=30 -o passwd_file=~/.passwd-s3fs

If you plan on using an s3 bucket with your workflow, please update the f_extra line in the config:

String f_extra = "--dataVolume [accesskey=XXX,secret=XXX,mode=rw]s3://BUCKET:/MOUNTPOINT"

Hello World (Read from Bucket)

Create hello.wdl

workflow helloWorld {
    String name
    call sayHello { input: name=name }
}

task sayHello {
    String name

    command {
        printf "[cromwell-say-hello] hello to ${name} on $(date)\n"
        sleep 30
    }
    output {
        String out = read_string(stdout())
    }
    runtime {
        f_docker: "cactus"
    }
}

Create hello.json in your bucket

{
    "helloWorld.name": "Developer"
}

Run command (edit for your corresponding mountpoint)

java -Dconfig.file=cromwell-float.conf -jar \
 cromwell-84.jar run hello.wdl \
 --inputs /MOUNTPOINT/hello.json

A successful workflow will end in something similar to the snippet below. You may ignore the text that appear afterwards. The most important part is the “Succeeded”

[INFO] [11/27/2023 18:15:23.336] [cromwell-system-akka.dispatchers.engine-dispatcher-30] [akka://cromwell-system/user/SingleWorkflowRunnerActor] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.
{
  "outputs": {
    "helloWorld.sayHello.out": "[cromwell-say-hello] hello to Developer on Mon Nov 27 18:12:55 UTC 2023"
  },
  "id": "a1f0606e-367f-43a5-9381-e4ebe09ffbcf"
}
[2023-11-27 18:15:24,91] [info] Workflow polling stopped

Sequence Workflow (Write to Bucket)

Create seq.wdl (edit for your corresponding mountpoint)

workflow myWorkflow {
    call sayHello
    call writeReadFile { input: s=sayHello.out }
}

task sayHello {
    command {
        printf "[cromwell-say-hello] hello from $(whoami) on $(date)"
        sleep 30
    }
    output {
        String out = read_string(stdout())
    }
    runtime {
        f_docker: "cactus"
        f_cpu: "2"
        f_memory: "4"
    }
}

task writeReadFile {
    String s

    command {
        printf "[cromwell-write-read-file] write input to a file: ${s}\n" > /MOUNTPOINT/my_file.txt
        cat /MOUNTPOINT/my_file.txt
    }
    output {
        String out = read_string(stdout())
    }
    runtime {
        f_docker: "cactus"
    }
}

Command

java -Dconfig.file=cromwell-float.conf -jar cromwell-84.jar run seq.wdl

You should expect to see the my_file.txt created and populated in your bucket