Who should read this blog?
If you are a Cloud Admin or a Principal Investigator and you are concerned about your team inadvertently exceeding their cloud budget, reading this blog is worth your time (3-minutes).
Real-world examples of cloud budget overruns
Here are real-world examples of cloud budget overruns...and the painful consequences.
In a presentation on cloud computing in grant-funded research environments, Robert J. Robbins explains how the success of projects can be jeopardized by unexpected (and accidental) expenses. Here are examples.
- An expense of $20,000 incurred because a researcher forgot to shut down an AWS instance after completing an analysis - the instance continued to run at $7 an hour for months.
- An expense of $72,000 incurred because of exponential auto-scaling caused by inserting small tests into product code - the company said they almost went bankrupt.
Why aren’t there solutions for this serious problem?
Limitations of existing solutions
There are solutions that monitor cloud cost and can stop running instances, but there are two critical limitations.
- Not timely - AWS Budgets, for example, can track cost and usage, but updates typically occur at 8–12 hour intervals. The cloud budget overrun may happen before the next update.
- Interim progress lost - even if you are alerted when the cloud budget is exceeded and you stop all running instances, interim progress made by unfinished jobs is lost. This means not only the money but the time has been spent and wasted.
Is there a single solution to overcome both of these two critical limitations?
SurfZone: Memory Machine Cloud's solution
Memory Machine Cloud's SurfZone overcomes both limitations.
- Timely - SurfZone provides real-time enforcement for a user's cloud spend (default update interval is one hour but can be configured to a shorter or longer interval).
- Interim progress saved - When a user reaches the cloud budget limit, SurfZone checkpoints the current application state and puts the application into sleep mode. The application automatically resumes when additional budget is applied.
Typical user journey
A typical user journey resembles the following.
- Cloud Admin or Principal Investigator defines a quota (budget) group as follows.
- What is the quota type you would like to monitor (for example, cost)?
- What action should be taken when the usage exceeds the quota (for example, suspend or terminate the job)?
- What action should be taken when additional quota is applied (for example, auto-resume)?
- Cloud Admin or Principal Investigator applies the quota group to a user - the user's jobs are automatically subject to the enforcement mechanisms defined by the quota group.
- Cloud Admin (or Principal Investigator) and user get notifications when user's spending reaches the quota limit or any quota-related actions occur.
What’s next?
In a subsequent blog, we’ll take a deeper dive into the user interface that supports SurfZone.
What do you think of this feature? Does it work for you or your organization? If you would like to preview this feature, leave your contact information in the comments below.
Memory Machine Cloud by MemVerge