Screaming in the Cloud

Episode 31: Hey Sam, wake up. It’s 3am, and time to solve a murder mystery!

Have you ever been on-call duty as an IT person or otherwise? Woken up at 3 a.m. to solve a problem? Did you have to go through log files or look at a dashboard to figure out what was going on? Did you think there has got to be a better way to troubleshoot and solve problems?

Today, we’re talking to Sam Bashton, who previously ran a premiere consulting partner with Amazon Web Services (AWS). Recently, he started runbook.cloud, which is a tool built on top of serverless technology that helps people find and troubleshoot problems within their AWS environment.

Some of the highlights of the show include:

  • Runbook.cloud looks at metrics to generate machine learning (ML) intelligence to pinpoint issues and present users with a pre-written set of solutions
  • Runbook.cloud looks at all potential problems that can be detected in context with how the infrastructure is being used without being annoying and useless
  • ML is used to do trend analysis and understand how a specific customer is using a service for a specific auto scaling group or Lambda functions
  • Runbook.cloud takes all aggregate data to influence alerts; if there’s a problem in a specific region with a specific service, the tool is careful to caveat it
  • Various monitoring solutions are on the market; runbook.cloud is designed for a mass market environment; it takes metrics that AWS provides for free and makes it so you don’t need to worry about them
  • Will runbook.cloud compete with or sell out to AWS? Amazon wants to build underlying infrastructure, other people to use its APIs to build interfaces for users
  • Runbook.cloud is sold through AWS Marketplace; it’s a subscription service where you pay by the hour and the charges are added to your AWS bill
  • Amazon vs. Other Cloud Providers: Work is involved to detect problems that address multiple Clouds; it doesn’t make sense to branch out to other Clouds
  • Runbook.cloud was built on top of serverless technology for business financial reasons; way to align outlay and costs because you pay for exactly what you use
  • Analysis paralysis is real; it comes down to getting the emotional toil of making decisions down to as few decision points as possible
  • Save money on Lambda; instead of using several Lambda functions concurrently, put everything into a single function using Go
  • AWS responds to customers to discover how they use its services; it comes down to what customers need

Links:


Brought to you by Corey Quinn of Screaming in the Cloud