Site Reliability Engineering Manager

Posted 5 Days Ago
Easy Apply
Be an Early Applicant
Halifax, NS
Senior level
Healthtech • Software
ESO provides software to help improve community health and safety through the power of data
The Role
The Site Reliability Engineering Manager will lead efforts to enhance cloud infrastructure observability and resiliency. Responsibilities include managing SRE engineering teams, incident management, driving automation, supporting operations, and collaborating for optimal system performance and reliability. A focus on continuous improvement and mentoring team members is required.
Summary Generated by Built In

About ESO: 

ESO is a rapidly growing technology company that is passionate about improving community health and safety through the power of data. We provide software applications, interoperability and data management solutions to emergency medical services, fire departments and hospitals.  

We’re small enough to be nimble and fun, but big enough to be a great, stable place to work. We serve more than 10,000 customers out of our US, Canadian, Northern Ireland, Czech Republic and Danish offices. 

Site Reliability Engineering Manager: 

We are looking for a Site Reliability Engineering (SRE) Manager to join our Nova Scotia team.

  • In this role, you will lead efforts to enhance observability across our cloud infrastructure, driving improvements to the resiliency of our mission-critical applications. 
  • You will manage a team of engineers who monitor, investigate, and resolve real-time production incidents, ensuring quick response and resolution in line with our product SLAs and SLOs.
  • As an Incident Manager, you will take a hands-on approach to managing critical production incidents, coordinating cross-functional teams, facilitating root cause analysis, and implementing preventive measures to minimize future disruptions.
  • Your leadership will be essential in establishing and maintaining a culture of operational excellence, ensuring that production systems remain stable and performant at all times. 

 This role reports to the Director of Cloud Engineering and will require you to: 

  • Provide functional management of SRE team members. 
  • Deliver performance feedback and compensation reviews. 
  • Mentor, set goals and support career planning for your team members. 
  • Oversee daily planning, escalations, and triage meetings to support operations. 
  • Collaborate with other teams to ensure systems are operating in accordance with SLAs, SLOs and error rate budgets. 
  • Proactively develop automation solutions to support improved MTTR and self-healing in a distributed systems environment. 
  • Develop runbooks for routine production operations. 
  • Craft monitoring & observability solutions that drive meaningful action and insight. 
  • Identify and present solutions that can improve the performance and reliability of ESO’s systems. 
  • Report on core operational metrics that provide a trended view of the health of ESO’s applications. 
  • Leading and drive Incident Management events, coordinating necessary communication and documentation for quick detection, resolution and follow-up. 
  • Organize and run RCAs and 5-Why's with responsible teams, creating process changes that drive system reliability and scalability.   

More about what you’ll be doing 

The ideal candidate for this role will have experience in hiring, mentoring, and managing technical teams while driving execution using Agile methodologies. As a member of the engineering leadership team, you will work closely with development teams and business stakeholders throughout the SDLC to ensure ESO’s products are designed for optimal performance, scalability, security, and high availability. 

In addition to technical leadership, you should have experience in incident management, overseeing the resolution of critical production issues and driving post-incident reviews to improve systems and processes. You will play a key role in fostering a culture of continuous improvement, ensuring that incident response processes are efficient and that lessons learned are applied to enhance system resilience. 

ESO’s infrastructure primarily runs on Azure, utilizing IaaS, PaaS, and Serverless technologies within the platform. You should be well-versed in modern DevOps practices, including zero-downtime deployment strategies, and have a strong grasp of Infrastructure as Code (IaC) to ensure consistent configurations and prevent drift across environments. 

Essential Criteria: 

In this role, you will be instrumental in managing incident response, driving post-incident reviews, and ensuring that lessons learned are applied to further enhance system resilience and operational excellence. You are a creative technologist who leads by example, setting the pace and tone for building a high-performance team.

To be successful in this role, you should bring: 

  • 2+ years of experience managing a high-performing, technical team. 
  • Experience leading and driving Incident Management events, coordinating necessary communication and documentation for quick detection, resolution and follow-up. 
  • Experience organizing and running RCAs and 5-Why's with responsible teams, creating process changes that drive system reliability and scalability.   
  • Strong familiarity with Agile development methodologies (Scrum, Kanban). 
  • Operational experience supporting mission-critical, customer-facing systems. 
  • Prior experience as a Software Engineer, Site Reliability Engineer (SRE), or DevOps Engineer. 
  • Hands-on experience with cloud platforms (Azure, AWS, or GCP). 
  • Experience with monitoring, observability tools, and incident management. 

Desirable criteria: 

  • Experience with Microsoft Azure Serverless and PaaS technologies 
  • Experience with Docker Containers and Infrastructure as Code 
  • Experience in Healthcare / Public Safety sectors 
  • Familiarity with Continuous Integration (CI) and Continuous Deployment (CD) pipelines. 
  • Prior experience with source control management. 

Benefits & Perks

 

ESO offers a comprehensive suite of benefits to promote health and financial security for our employees and their families. For full-time employment this includes:

 

- Competitive health plan (medical, dental, & vision insurance)

- RRSP with company match

- Telemedicine service provided by ESO 

- Front-loaded vacation and sick time

- Employee Assistance Program (EAP)

- Peace of mind benefits such as life insurance and disability insurance

- Casual office environments and unlimited office snacks and drinks

 

Are you ready to Make a Difference? At ESO, we believe in bringing your true self to work every single day. If you don’t match all the qualifications on the job description, we encourage you to apply anyway! We are looking for passionate, innovative, and authentic people to help drive our mission. 

 

 

All offers are contingent upon a successful background check.

 

 

ESO is committed to creating a diverse and inclusive work environment and is proud to be an equal opportunity and affirmative action employer. We invite you to consider opportunities at ESO regardless of your gender; gender identity; gender reassignment; age; religion; race; national origin; political affiliation; sexual orientation; disability; veteran status; or other non-merit factor.  

 

Top Skills

Azure
The Company
HQ: Austin, TX
634 Employees
Hybrid Workplace
Year Founded: 2004

What We Do

ESO is a fast-paced, growing data, technology, and research company passionate about improving community health and safety through the power of data. We pioneer innovative, user-friendly software to meet the changing needs of today’s EMS agencies, fire departments, and hospitals. We’re small enough to be nimble and fun, but big enough to be a great place to work. We serve thousands of customers out of our four US offices and our Belfast, Northern Ireland office.

We believe in the power of data to improve community health and safety. That’s not just some lofty corporate vision statement — it’s something we live, breathe and see the results of every day. We approach our work as if the lives of our own families and friends depended on the results. Because a lot of the time … they do.

Why Work With Us

We believe in taking great care of our customers and our employees. We believe work ought to be both challenging and fun. (Otherwise what’s the point?) We believe it’s worthwhile to continually push for something better, to pursue excellence for the sake of excellence, and to hold each other to the same standard.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account