What is AppCapsule and what problems does it solve?
AppCapsule creates a snapshot of all the data in memory, and all the application states required to restore the application at this moment in time, as well as the data on storage. With AppCapsule, we can transport the application runtime across different machines.
Three problems that AppCapsule can help to solve -
- Spot reclaim - If there is a spot eviction event, Memory Machine Cloud creates an AppCapsule, and moves it to another spot instance, where the job will continue. That means, you can save cost running jobs on spot instance, while keeping all your job progress and does not need to run from the beginning all over again. A.k.a. SpotSurfer.
- OOM (Out Of Memory) - Memory Machine Cloud detects if OOM is about to happen and if so, creates an AppCapsule, and moves it to another instance to avoid the application crash as well as have the job continue running.
- Right-sizing - vertical scaling down to avoid over-provisioning or up to provide more resources to the application as it is needed for shorter wall-time and possibly lower total cost. You can trigger either manually by using float cli command / OpCenter Web GUI or automatically done by Memory Machine Cloud (a.k.a. WaveRider).
What is AppCapsule++ and what benefits does it bring?
Three enhancements and benefits that AppCapsule++ brings to the users -
- Only saving delta - instead of saving a full snapshot, AppCapsule++ only saves the difference (memory change) since the previous snapshot. This way, the time required for the final snapshot is minimal.
- Adaptive trigger - instead of snapshotting in a fixed, pre-determined interval, AppCapsule++ automatically and intelligently checks the memory change and take a snapshot when it reaches a threshold, which is calculated based on the cloud service provider’s pre-emption window. The benefit is no need to guess the fixed interval for periodic snapshot.
- Asynchronous offloading - instead of synchronous snapshotting which will pause the application, AppCapsule++ does asynchronous data offloading, which decreases the time that the application is frozen. The benefit is lower impact to job completion wall-time.
Below is an overall comparison table of AppCapsule and AppCapsule++
Comparison | AppCapsule | AppCapsule++ |
---|---|---|
Snapshot content | Full snapshot | Delta snapshot |
Snapshot trigger | Can be - Periodic done by MMCloud, or - Manual done by user’s script |
Adaptive |
Configuration | Pre-determined interval for periodic snapshot | No special configuration |
Application completion wall-time | Impact higher (Synchronous) |
Impact lower (Asynchronous) |
Storage overhead | Relatively lower (only need to save final snapshot to restore) |
Relatively higher (need to save all delta snapshots to restore) |
Usage | --dumpMode full |
--dumpMode incremental |
Benchmark comparison
Below is the benchmark experiment result running on AWS comparing AppCapsule and AppCapsule++. The experiment is based on 50GB memory change in terms of 60GB, 120GB, 240GB, and 480GB application memory usage, and how much time in seconds did AppCapsule and AppCapsule++ take to take the final snapshot.
Three observations from this benchmark -
- For AppCapsule, as application memory usage grows, the snapshot time also increases. The reason is that AppCapsule takes a complete snapshot every time.
- For AppCapsule++, as application memory usage grows, the snapshot time stays more or less the same. The reason is that AppCapsule++ takes a delta snapshot, which is triggered when the amount of change in memory reaches a certain threshold.
- Considering using spot instance on AWS, which has 2-minute pre-emption time window, AppCapsule can only support relatively small workload (in this experiment for example, 60GB since the snapshot time is 109 seconds). AppCapsule will not be able to support bigger workload (in this experiment for example, 120GB needs 250 seconds to finish a snapshot which is way over the 2-minute time window). However, since AppCapsule++ only takes delta snapshot, the snapshot time is consistent with memory change, not the application memory usage.
More detailed information in table format below:
Experiment No. | Experiment Variable | AppCapsule | AppCapsule++ |
---|---|---|---|
1 | Application memory usage: 60G Application memory change: 50G |
109s r5.2xlarge/gp3 |
91s r5.2xlarge/gp3 |
2 | Application memory usage: 120G Application memory change: 50G |
250s r5.4xlarge/io1 |
102s r5.4xlarge/io1 |
3 | Application memory usage: 240G Application memory change: 50G |
533s r5.8xlarge/io1 |
104s r5.8xlarge/io1 |
4 | Application memory usage: 480G Application memory change: 50G |
1548s r5.16xlarge/io1 |
80s r5.16xlarge/io1 |
Call to action - save your spending and time
What is the characteristic of your running job and environment? AppCapsule++ is available now with Memory Machine Cloud 2.4 Goa release. Please reach out or leave comments below and we would love to chat and help!