Site Reliability Engineering at Ignite 2018

Site Reliability Engineering (SRE) is a new role for many folks in the Microsoft ecosystem. This role has been around with some major companies like Google, LinkedIn, Facebook and Etsy. Translating the SRE role to an enterprise IT organization has been something that Microsoft has been part of and driving for Microsoft, but also for our customers. At Ignite 2018, you are going to see the thoughts of this transformation into SRE from the mindset of Service Engineering.

For those of you attending Ignite 2018 in-person, please join me and my other SRE speakers along with other speakers on how to succeed with Azure in the Azure Customer Success area in the Microsoft Showcase area left of the Landmark as you walk in. Look in the Applications & Infrastructure area for the Customer Success in Azure area. I will be in the booth from 2:00 pm to 5:00 pm on Monday, Sept. 24, 2018 and 10:00 am to 1:00pm on Wednesday, Sept. 26, 2018. While we will have speakers to the SRE role at all times in the Customer Success in Azure area, please feel free to stop by during my shifts to understand about the change to the SRE role from IT Operations.

MSIgniteFloorPlan.png

While the Customer Success in Azure area is a great opportunity for those of you here in Orlando, there are ways for folks attending virtually and in-person to get more information on the SRE role. We have four great sessions about the SRE role through the week and some great speakers presenting those sessions. Join these great speakers, including myself, to hear more about how the SRE role works and how IT Pros can look to move to the SRE role in their career. These sessions will be available live for those in-person, live-streamed for those unable to be here in-person, and recorded to view after they are complete.

Please come join myself, David Blank-Edelman, Kishore Jalleda, and Jason Hand to understand how this role fits into not only single service online companies but into the corporate IT environment.

Tuesday, September 25, 2018

BRK2272 - Introducing Site Reliability Engineering
David Blank-Edelman, Microsoft 
9:00 AM in OCCC W240 (45 min)

Just within the last fifteen years we have seen at least two separate communities evolve from the generic idea of operations. The first, DevOps, grew up very much in public. The second, Site Reliability Engineering (SRE) germinated more within the halls of public cloud providers, but is now starting to catch on like wildfire throughout the industry in organizations of all sizes and stripes. SRE is providing them with a concrete approach for preserving the stability of their production environment while maintaining the feature velocity crucial for the success of the business. Join us while we explore the basic ideas behind SRE and talk about how you can get started implementing its principles and practices in your own organization.

BRK2314 - Incident response: Where SRE and DevOps collide
Kishore Jalleda, Microsoft 
Jason Hand, Microsoft
10:45 AM in OCCC W205 (75 min)

What happens when things go wrong? The 1ES Site Reliability Engineering (SRE) team has built an effective incident response process that drives reliability and performance in their own services and services they depend on. We dive into what incident response looks like from notification or detection all the way through the post-mortem and remediation of the contributing factors.


Thursday, September 27, 2018

BRK4025 - Implementing SRE practices on Azure: SLI/SLO deep dive
David Blank-Edelman, Microsoft 
9:00 AM in OCCC W311 A-D (45 min)

One of the most useful practices many organizations embrace when they first implement Site Reliability Engineering (SRE) is the adoption of Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Once in place, they can serve as a concrete foundation for the tricky negotiation between feature velocity and operational stability crucial for achieving the desired reliability of your services, systems, and products. Join us for a technical deep dive as we explore the basics of SLIs/SLOs and the tools Microsoft Azure provides to help implement and manage them in your environment.

BRK2362 - The SRE role: An unexpected journey
Jared Shockley, Microsoft
10:45 AM in OCCC W304 E-H (75 min)

As the world of information technology advances, the correlating roles and responsibilities also continue to evolve. Examining the progress from IT operations through service engineering and into site reliability engineering, IT pros will need a strategic development plan that builds on current skill sets.

In this session, we discuss the mindset required for effective site reliability engineering, including how to most efficiently grow career skills, utilize specific tools and processes, and incorporate lessons learned from inherent failures. We also analyze the results of platform moves to modern engineering practices and systems.