Smile Digital Health Logo

Smile Digital Health

Cloud Site Reliability Engineer

Posted 16 Days Ago
Be an Early Applicant
In-Office or Remote
2 Locations
Mid level
In-Office or Remote
2 Locations
Mid level
The Cloud Site Reliability Engineer ensures reliability and scalability of services across cloud platforms, automates testing frameworks, and collaborates with teams for optimal performance and security.
The summary above was generated by AI
Working for a company like Smile Digital Health means supporting our mandate for #BetterGlobalHealth. We strive towards this goal every day, and the results can be seen in the impact of our innovative health data platform and data management solutions, which are used in over 20 countries. We were #19 on Deloitte's Technology Fast 50 Ranking for 2024! 
 
Smile Digital Health makes it easy for healthcare stakeholders to collect and exchange data with our leading FHIR-based data liberation platform.
 
At its heart, the Smile platform enables people and organizations to better manage healthcare data. We help generate and liberate structured healthcare data to ensure effective delivery across care teams and health systems bringing  #BetterGlobalHealth to patients everyday!

Apply today and find plenty of reasons to SMILE!

The Cloud Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of production-grade services deployed across multiple cloud vendors and infrastructure platforms for Smile Digital Health, its clients, and partners. This role designs and automates performance testing frameworks, integrates them into CI/CD pipelines, and uses observability tools to proactively detect and resolve bottlenecks. Working closely with engineering, product, and security teams, the SRE ensures systems meet strict SLAs for performance and availability while driving continuous optimization across multiple cloud platforms.

Responsibilities:

  • Collaborate with our Security Operations teams to help define and implement best practices around Cloud Service Provider configuration for Azure and other cloud providers.
  • Develop, implement and coordinate a multi-tenant approach around service offerings for DB, Container platform, Authentication, Certificates, and Product Registries etc.
  • Design and maintain performance testing strategies, framework, and environments in the cloud. Develop and maintain cost/utilization tracking and attribution processes for all Cloud Service Providers.
  • Create documentation around Cloud Service Provider offerings detailing use cases, best practices, and implementation details.
  • Develop and maintain technical relationships with our core Cloud Service Providers.
  • Implement and maintain a secure and scalable infrastructure platform for delivering Cloud Services applications.
  • Ensure that internal and external SLA’s meet and exceed expectations, and ensure that system centric KPIs are continuously monitored and improved.
  • Create tools for automating deployment, monitoring and operations of the overall platform.
  • Participate in an on-call rotation to provide application support, incident management, and troubleshooting.
  • Provide ongoing maintenance and support of internal tools, improve system health and reliability.
  • Assist customers with the on-site deployments when needed.
  • Implement and manage observability tools (logging, metrics, tracing) for performance insights, Otel and Grafana Stack preferredOngoing compliance with organizational policies, procedures and practices (such as but not limited to security policies) are an ongoing requirement of the employment or contractual agreement. 
  • Accountable for ensuring that all working hours are accurately reported in Time Tracking System on a daily or weekly basis, that the majority of (if not all) hours are tracked as billable and that the project management tool in the time tracking system is properly and fully utilized. 
  • Tracking and reporting of billable hours is a critical aspect of project management and delivery to our customers and this is a major area of accountability.
  • Comply with the privacy, security and confidentiality policies. Hold all confidential information in trust and strict confidence and ensure that it shall be used only for the purposes required to fulfill employment obligations, and shall not be used for any other purpose, or disclosed to any third party.

Requirements:

  • Demonstrated expertise of cloud service providers and best practices around implementation and configuration, preferably managing Azure on behalf of multiple teams for a company that delivers SaaS products.
  • Experience with Kubernetes, Openshift, Kafka, Elastic stack. Proven experience working with microservices architecture, with a strong focus on Java-based services.
  • Experience in applying chaos engineering practices to evaluate and enhance system resiliency.
  • Skilled in troubleshooting performance issues, including analyzing time consumption, allocating resources, and recommending optimizations.
  • Familiar with performance testing methodologies and tools to assess system behavior under load.
  • Proven experience with Security and Compliance (SOC2, HIPAA, ISO27001) best practices and how to implement controls that support high-velocity software delivery teams.
  • Proficiency in Terraform, Ansible or Chef.Expertise in troubleshooting, support escalation, on-call process optimization and documenting knowledge.
  • Passionate about Infrastructure as code, automation, and developing solutions that help developers move quickly and safely.
  • Familiarity with infrastructure management and operations lifecycle concepts and ecosystem.
  • Experience operating and maintaining production systems in a Linux and public cloud environment.
  • You have prior experience working in high-performance or distributed systems, while we strive to hire at a variety of experience levels.
  • Working knowledge of industry best practices regarding information securityPrevious experience building or maintaining a large-scale Cloud service.
  • Proven ability to prioritize and track multiple projects in parallel.Proven ability to be highly responsive and customer-focused.

Some of the benefits we offer:
* Remote Work Environment
* Flexible Time Away From Work Policy including PTO, Personal and Sick Days
* Competitive Salary and Health/Medical Benefits
* RRSP/TFSA/401K Employee Contribution
* Life and Disability
* Employee Assistance Program
* FHIR Study Program and Skillsoft Learning
* Super HAPI Fun Club

Smile's core values include respect, inclusion, embracing our differences, and celebrating shared values because our people are the foundation of our success. We are big on creating a sense of belonging and empowering each other to bring our authentic selves to work.  We are dedicated to fostering a workplace that values diversity, equity, and inclusion.
 
We welcome and encourage candidates of all backgrounds to apply. Candidates are encouraged to inform us if they wish to discuss or require accommodations during interviews or while working at Smile.

Top Skills

Ansible
Azure
Chef
Elastic Stack
Kafka
Kubernetes
Openshift
Terraform

Smile Digital Health Toronto, Ontario, CAN Office

Toronto, Ontario, Canada

Similar Jobs

An Hour Ago
Remote or Hybrid
Ottawa, ON, CAN
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Principal Platform Architect guides customers in their digital transformation journey through the ServiceNow platform, ensuring alignment between technology solutions and business objectives while providing governance and technical expertise.
Top Skills: AICloud Application TechnologyServicenow
An Hour Ago
Remote or Hybrid
Toronto, ON, CAN
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Architects guide customers through digital transformation by providing architectural guidance, optimizing platform strategies, and ensuring successful project delivery with a focus on ServiceNow solutions.
Top Skills: AICloud Application TechnologyServicenow
An Hour Ago
Remote or Hybrid
Toronto, ON, CAN
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Principal Platform Architect is responsible for advising clients on establishing a technical foundation in the ServiceNow platform and driving effective digital transformation. This role includes managing technical governance, developing customer roadmaps, and guiding teams in delivering technical solutions.
Top Skills: AICloud ApplicationsServicenow Platform

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account