Over the past year, hundreds of customers have tried Memory Machine Cloud, and they have given us overwhelmingly positive feedback, especially on how we allow them to run stateful workloads on spot instances confidently knowing that Memory Machine Cloud gracefully handles spot interruptions. Many have asked the question, “how are you different from Spot by Netapp?” Here are my thoughts.
I have huge respect for the product and the team at Spot, who shared common investors as MemVerge before joining NetApp in 2020. It has been in the market for many years, has helped hundreds of customers, and it is now a crucial part of NetApp’s growth story. We aspire to reach the same level of success and market influence as Spot and have much to learn from them.
“The start of a hunt” by SandFlash, licensed under CC BY 2.0
Advice & Warnings vs the Transporter Beam
Spot instances look and feel like normal VM instances except they can be reclaimed by the cloud provider with minimal notice. If we compared the cloud to the Serengeti, running workloads on Spot instances is like zebras grazing the vast plains. While the grass is sweet, lions and other predators can appear at any moment, and there is a short time window to react. How to make it safe for the Zebras? Spot by NetApp (Spot.io) and Memory Machine Cloud (MMCloud) take different approaches.
Spot.io’s solution is ElastiGroup. ElastiGroup uses historical data and heuristics to determine if the requested machine type is available on the spot market, and whether it is likely to encounter a spot reclaim. Once the spot instance has been allocated, it attempts to provide early warning to the application for Spot reclaim events. In our Serengeti analogy, ElastiGroup provides advice to the zebra on the areas where there should be fewer predators and does its best to provide an early warning. However, there is no guarantee and predators may still show, and if they do, well then, the zebra is killed. Hence it only works if the zebra is part of a herd (aka a stateless cluster) that can tolerate losing a few members. At the same time, ElastiGroup’s early warning does not really mean the predator is coming, so the zebra could be foregoing juicy grass by reacting to a false alarm. ElastiGroup Stateful Node saves the boot volume and configurations of the node so that in the case of a spot reclaim, the same node can be recreated. However, all the progress that has been made on the node will be lost. In essence, ElastiGroup helps but is not very efficient. The root of the issue is that Spot.io does not have the technology to protect the zebra when the lion attacks, i.e., protect the workload when the cloud provider sends the eviction signal.
MMCloud relies on its AppCapsule technology to protect against spot reclaims. When the cloud signals that the spot instance will be reclaimed, MMCloud captures the state of the running application into the AppCapsule, acquires a new spot instance, and resumes the running application after restoring the states stored in the AppCapsule. Using our Serengeti analogy again, MMCloud provides a transporter beam. With it, the zebra can graze anywhere on the vast plain, and only when the predator starts its attack run, MMCloud beams the zebra to a different part of the plain, where the zebra continues its peaceful grazing, never even realizing that it has been transported to a new location.
The key advantages of MMCloud over Spot Elastigroup in managing spot instances are:
· Never lose application progress
· No need to change application
· Minimize/Eliminate false alarm
· Support Stateful applications
These translate into better reliability and lower cost for running applications on spot instances.
Different approaches to Cloud Computing
While Memory Machine Cloud and Spot by NetApp both leverage the Cloud spot instances to improve cloud cost efficiency for users, we take fundamentally different approaches to the problem. MMCloud has the lofty goal of running all workloads in a Serverless fashion, as I discussed in an earlier blog. That is, our software automates the end-to-end cycle of running workloads in the cloud, to the degree that the user can focus on their task, and forget about the infrastructure. Spot.io’s model maintains the clear demarcation between infrastructure and application, and optimizes the pool of resources that are available for the applications. (An aside – it just occurred to me that Spot.io grew up prior to the wide adoption of Containers, the framework we took for granted while designing MMCloud.) The complexity of running the application itself is left to the IT/Cloud admin. Nor does it actively optimize the infrastructure while the application is running. In comparison to Spot.io, MMCloud:
Memory machine cloud and Spot by NetApp use different methods to enable applications to take advantage of the low cost Spot instances. Below is a summary of the differences.
|Memory Machine Cloud
|Spot by NetApp
|Maximize Spot Usage
|Transparent recovery from Spot reclaim
|Continue application execution after spot reclaim
|Support for long-running stateful applications on Spot instances
|Granular observability of application resource utilization
|Automation of application deployment
|Rightsizing of VMs during application execution
Memory Machine Cloud is bringing about a paradigm shift where application users are liberated from the chores of managing the cloud infrastructure, and where the compute infrastructure itself can optimize resource utilization. These are early days yet and we will be focusing our energy to advance in that journey.