Screaming in the Cloud
Episode 3: Turning Off Someone Else's Site as a Service
How do you encourage businesses to pick Google Cloud over Amazon and other providers? How do you advocate for selecting Google Cloud to be successful on that platform? Google Cloud is not just a toy with fun features, but is a a capable Cloud service.
Today, we’re talking to Seth Vargo, a Senior Staff Developer Advocate at Google. Previously, he worked at HashiCorp in a similar advocacy role and worked very closely with Terraform, Vault, Consul, Nomad, and other tools. He left HashiCorp to join Google Cloud and talk about those tools and his experiences with Chef and Puppet, as well as communities surrounding them. He wants to share with you how to use these tools to integrate with Google Cloud and help drive product direction.
Some of the highlights of the show include:
- Strengths related to Google Cloud include its billing aspect. You can work on Cloud bills and terminate all billable resources. The button you click in the user interface to disable billing across an entire project and delete all billable resources has an API. You can build a chat bot or script, too. It presents anything you’ve done in the Consul by clicking and pointing, as well as gives you what that looks like in code form.
- You can expose that from other people’s accounts because turning off someone else’s Website as a service can be beneficial. You can invite anyone with a Google account, not just ‘@gmail.com’ but ‘@’ any domain and give them admin or editor permissions across a project. They’re effectively part of your organization within the scope of that project. For example, this feature is useful for training or if a consultant needs to see all of your different clients in one dashboard, but your clients can’t see each other.
- Google is a household name. However, it’s important to recognize that advocacy is not just external advocacy, there’s an internal component to it. There’s many parts of Google and many features of Google Cloud that people aren’t aware of. As an advocate, Seth’s job is to help people win.
- Besides showing people how they can be successful on Google Cloud, Seth focuses on strategic complaining. He is deeply ingrained in several DevOps and configuration management communities, which provide him with positive and negative feedback. It’s his job to take that feedback and convert it into meaningful action items for product teams to prioritize and put on roadmaps. Then, the voice of the communities are echoed in the features and products being internally developed.
- Amazon has been in the Cloud business for a long time. What took Google so long? For a long time, Google was perceived as being late to the party and not able to offer as comprehensive and experienced services as Amazon. Now, people view Google Cloud as not being substandard, but not where serious business happens. It’s a fully feature platform and it comes down to preferences and pre-existing features, not capability.
- Small and mid-size companies typically pick a Cloud provider and stick with their choice. Larger companies and enterprises, such as Fortune 50 and Fortune 500 companies, pick multiple Clouds. This is usually due to some type of legal compliance issues, or there are Cloud providers that have specific features.
- Externally at Google, there is the Deployment Manager tool at cloud.google.com. It’s the equivalent of CloudFormation, and teams at Google are staffed full time to perform engineering work on it. Every API that you get by clicking a button on cloud.google.com are viewing the API Docs accessible via the Deployment Manager.
- Google Cloud also partners with open source tools and corresponding companies. There are people at Google who are paid by Google who work full time on open source tools, like Terraform, Chef, and Puppet. This allows you to provision Google Cloud resources using the tools that you prefer.
- According to Seth, there’s five key pillars of DevOps: 1) Reduce organizational silos and break down barriers between teams; 2) Accept failures; 3) Implement gradual change; 4) Tooling and automation; and 5) Measure everything.
- Think of DevOps as an interface in programming language, like Java, or a type of language where it doesn’t actually define what you do, but gives you a high level of what the function is supposed to implement.
- With the SRE discipline, there’s a prescribed way for performing those five pillars of DevOps. Specific tools and technologies used within Google, some of which are exposed publicly as part of Google Cloud, enable the kind of DevOps culture and DevOps mindset that occur.
- A reason why Google offers abstract classes in programming is that there’s more than one way to solve a problem, and SRE is just one of those ways. It’s the way that has worked best for Google, and it has worked best for a number of customers that Google is working with. But there are some other ways, too. Google supports those ways and recognizes that there isn’t just one path to operational success, but many ways to reach that prosperity.
- The book, Site Reliability Engineering, describes how Google does SRE, which tried to be evangelized with the world because it can help people improve operations. The flip side of that is that organizations need to be cognizant of their own requirements.
- Google has always held up along several other companies as a shining beacon of how infrastructure management could be. But some say there’s still problems with its infrastructure, even after 20-some years and billions invested.
- Every company has problems, some of them technical, some cultural. Google is no exception. The one key difference is the way Google handles issues from a cultural perspective. It focuses on fixing the problem and making sure it doesn’t happen again. There’s a very blameless culture.
- Conferences tend to include a lot of hand waving and storytelling. But as an industry, more war stories need to be told instead of pleasure stories. Conference organizers want to see sunshine and rainbows because that sells tickets and makes people happy. The systemic problem is how to talk about problems out in the open.
- Becoming frustrated and trying to figure out why computers do certain things is a key component of the SRE discipline referred to as Toil - work tied to systems that either we don’t understand or don’t make sense to automate.
- Those going to Google Cloud to ‘move and improve’ tend to be a mix of those from other Cloud providers and those from on-premise data center deployments. Move and improve is where there are VMs in a data center, and they need to be moved to the Cloud.
- There are tiny differences around the Cloud-native paradigm and providers. There’s some key pillars: Does it handle restarts well? Is it highly available? Can it be containerized, even though containers aren’t necessarily required for Cloud native? Does it package all of its dependencies with it? Can it run on different operating systems? All of these things are generic, they’re not specific to a Cloud provider.
Site Reliability Engineering book for O’Reilly
Quotes by Seth:
“Everything we do on Google Cloud is API First. Anytime you click a button in that Web UI, there is a corresponding API call, which means you can build automation, compliance, and testing around these various aspects.”
“The IAM and permission management in Google Cloud is incredibly powerful. It leverages the same IAM permissions that G Suite has which is hosted Gmail, Calendar, and all of those other things.”
“How do I get people who want to use Google Cloud or don’t know about Google Cloud? The ability to be successful on the platform.”
“I would definitely say that any company you work at, whether the recruiter tells you that it’s all sunshine and rainbows and there’s nothing ever wrong is a lie.”
Brought to you by Corey Quinn of Screaming in the Cloud