Book a Demo
Book a Demo
Chinese Mitten Crab - Culinary Delight, Scientific Puzzle

Chinese Mitten Crab - Culinary Delight, Scientific Puzzle

Cedric Druce 2023-11-27483

Anyway you look at it, the Chinese mitten crab (also known as the Shanghai hairy crab because of the setae on its claws) has something to offer. For the chef, a seasonal delicacy — full of sweetness and umami flavors. To the aquaculturalist, an important crop and revenue source. This crab is a marvel to biologists — it can live in freshwater and saltwater? Regenerate limbs? Really?

For the geneticist, the challenge is to trace these phylogenetic behaviors (behaviors typical of the species) back to the genome and transcriptome level.

Eriocheir sinensis

The Chinese mitten crab (Eriocheir sinensis) is native to China where it is found in lakes and rivers. In Europe and North America, it is considered an invasive species. The mitten crab spends most of its life in freshwater although the reproductive phase occurs in brackish water and in seawater (the species is catadromous). The change in salinity requires a complicated mechanism (osmoregulation) to maintain the ionic balance in its cells.

The first discovery of Chinese mitten crabs in North America occurred in the San Francisco Bay in 1992. Since then, specimens have been found in the Chesapeake Bay, Hudson River, and the Great Lakes. They have been found all over Europe — in the Thames River, the Tagus River, the Danube, and even in the Venice lagoon, although finding an isolated specimen does not mean that a permanent population has established itself. Mitten crabs have a high reproductive rate and can withstand physiological stress (such as starvation and dessication) which makes them a formidable invasive species whose burrows undermine river bank stability. It's a mystery why no Chinese mitten crabs have been seen in California since 2010.

Beyond its culinary, agricultural, and environmental impacts, the Chinese mitten crab is worthy of study because of its ability to regenerate an entire limb (as long as it can molt). Regeneration capability is stronger in inverterbrates than in vertebrates where it seems to have declined. Birds and mammals both demonstrate some cell and tissue regeneration capability, so it is an interesting topic of medical research to see if regeneration ability can be stimulated.

It is no surprise, then, that Eriocheir sinensis has been studied extensively, in particular, to discover the gene expressions that underly its physiological and behavioral traits. The first genome sequencing, assembly, and annotation of the Chinese mitten crab was reported in 2016 and the current reference assembly was posted in 2022. The haploid chromosome number is 73 (2n=146) and the genome size is about 1.27 Gb, which puts it the high end of chromosome number among arthropods (inverterbrates with exoskeletons), although there is a large percentage of repetitive elements. For comparison, the haploid chromosome number for humans is 23 and the haploid genome has about 3.1 Gb.

Nextflow and MMCloud

So, what has this to do with Memory Machine Cloud (MMCloud)? RNA-seq is a popular computational pipeline for analysing gene expression and transcription activation using data generated by next-generation sequencing (NGS) machines from Illumina, PacBio, Oxford Nanopore Technologies, and others. A complicated pipeline, like RNA-seq, requires a workflow manager, such as Nextflow, to schedule and manage the individual tasks in the pipeline.

In Nextflow terminology, each task is assigned to an "executor," which is a complete environment for running that step in the analysis. By attaching the nf-float plugin to a workflow, Nextflow can use MMCloud as an "executor." From an MMCloud point of view, the execution task that Nextflow assigns to it is an independent job that it runs just like any other batch job. For the Nextflow user, the user experience remains the same while benefiting from all the MMCloud features, such as SpotSurfer, WaveRider, and WaveWatcher.

MMCloud release 2.3.3 introduced a POSIX-compliant, distributed file system that allows Nextflow pipelines to use a high-capacity storage service (such as AWS S3) as a high-performance file system. For complicated pipelines where individual tasks share a file system for writing and reading intermediate data, the performance of the file system has a dramatic impact.

Running a Nextflow pipeline on MMCloud is straightforward. To demonstrate using AWS EC2 compute instances, we ran nf-core/rnaseq on experimental Eriocheir sinensis sequence data available from the European Nucleotide Archive and the National Library of Medicine.

The nf-float plugin requires a configuration file, in which you specify parameters that determine the execution environment. One parameter is the VM creation policy. For this run, we chose "SpotFirst", which means that MMCloud tries to start a spot instance. If, after three attempts, a spot instance does not start, MMCloud starts an on-demand instance. If AWS reclaims the spot instance that a job is running on, MMCloud moves the job to a new instance and follows the same policy - try for a spot instance first and go to an on-demand instance if not successful.

RNA-Seq Results

At the end of the successful run, Nextflow displayed the following message, which shows that 170 tasks (jobs) completed with no failures.

-[nf-core/rnaseq] Pipeline completed successfully -
Completed at: 25-Nov-2023 10:42:32
Duration : 6h 14m 7s
CPU hours : 123.3
Succeeded : 170

Included in the output of nf-core/rnaseq is MultiQC, a reporting tool that summarizes (in html format) the statistics and quality metrics obtained from multiple analysis modules and data sets. MultiQC is helpful in checking the results of a pipeline run for errors and anomalies.

Examples of the output from MultiQC for the Eriocheir sinensis run are shown in the figures that follow.


The Chinese mitten crab is a fascinating bundle of surprises. There is no doubt that the species will continue to receive attention - on menus, in departments of environmental conservation, and as the subject of a range of "Omics" studies (for example, transcriptomics). As researchers become aware of the power of MMCloud, many of those Nextflow pipelines will run on MMCloud in AWS, Google Cloud, or AliCloud.

For the residents of Shanghai, however, the annual appearance of the Shanghai hairy crab is always an eagerly anticipated milestone.