Manulife Logo

Manulife

Site Reliability Engineer

Reposted Yesterday
Be an Early Applicant
In-Office
2 Locations
Mid level
In-Office
2 Locations
Mid level
Responsible for ensuring the reliability, availability, and performance of systems, implementing automation and tooling, and collaborating with teams to improve service operations.
The summary above was generated by AI

We are seeking a motivated Site Reliability Engineer (SRE) to join the Manulife Bank Service Delivery Management (SDM) team. In this role, you will be responsible for ensuring the reliability, availability, and performance of our systems and services, working closely with development and operations teams to build, operate, and continuously improve scalable and resilient platforms.This role offers opportunities to grow expertise in reliability engineering, automation, and large‑scale service operations, treating reliability as a core feature of the platform while balancing innovation with operational stability.
 

Position Responsibilities:  
Design, implement, and maintain infrastructure, tooling, and automation to improve service reliability and scalability.
Partner with development teams to ensure applications are designed for reliability, performance, and operational excellence.
Monitor system health, performance, and availability; troubleshoot and resolve issues to minimize customer impact.
Build and enhance observability capabilities, including metrics, logs, and traces, to enable early detection of issues.
Participate in on-call rotations to support critical systems and ensure timely incident response.
Perform root cause analysis for incidents and drive corrective actions to prevent recurrence.
Define, track, and improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
Support disaster recovery and resilience initiatives, including recovery testing and continuous improvement of recovery readiness.
Promote proactive monitoring and alerting practices to reduce mean time to detection (MTTD) and mean time to resolution (MTTR).
 
Required Qualifications:
Post-secondary education in Computer Science, Engineering, or a related field, or equivalent practical experience.
2–5 years of experience in a Site Reliability Engineering, DevOps, or similar reliability-focused role.
Strong knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and container technologies (e.g., Docker, Kubernetes).
Proficiency in scripting and automation using languages such as Python, Bash, or similar.
Experience with monitoring, logging, and observability tools (e.g., New Relic, Grafana, ADX).
Experience with Power Platform tools, particularly Power BI, for operational reporting and insights.
Familiarity with ITIL and Agile delivery methodologies.
Strong problem-solving skills, attention to detail, and ability to work effectively under pressure.
Excellent communication and collaboration skills, with the ability to work across technical and business teams.
 
When you join our team:
We’ll empower you to learn and grow the career you want.
We’ll recognize and support you in a flexible environment where well-being and inclusion are more than just words.
As part of our distributed team, we’ll support you in shaping the future you want to see.
 
 

About Manulife and John Hancock

Manulife Financial Corporation is a leading international financial services provider, helping people make their decisions easier and lives better. To learn more about us, visit https://www.manulife.com/en/about/our-story.html.

Manulife is an Equal Opportunity Employer

At Manulife/John Hancock, we embrace our diversity. We strive to attract, develop and retain a workforce that is as diverse as the customers we serve and to foster an inclusive work environment that embraces the strength of cultures and individuals. We are committed to fair recruitment, retention, advancement and compensation, and we administer all of our practices and programs without discrimination on the basis of race, ancestry, place of origin, colour, ethnic origin, citizenship, religion or religious beliefs, creed, sex (including pregnancy and pregnancy-related conditions), sexual orientation, genetic characteristics, veteran status, gender identity, gender expression, age, marital status, family status, disability, or any other ground protected by applicable law.

It is our priority to remove barriers to provide equal access to employment. A Human Resources representative will work with applicants who request a reasonable accommodation during the application process. All information shared during the accommodation request process will be stored and used in a manner that is consistent with applicable laws and Manulife/John Hancock policies. To request a reasonable accommodation in the application process, contact [email protected].

Referenced Salary Location

Waterloo, Ontario

Working Arrangement

Hybrid

Salary range is expected to be between

$86,100.00 CAD - $136,100.00 CAD

If you are applying for this role outside of the primary location, please contact [email protected] for the salary range for your location. The actual salary will vary depending on local market conditions, geography and relevant job-related factors such as knowledge, skills, qualifications, experience, and education/training. Employees also have the opportunity to participate in incentive programs and earn incentive compensation tied to business and individual performance.

Manulife offers eligible employees a wide array of customizable benefits, including health, dental, mental health, vision, short- and long-term disability, life and AD&D insurance coverage, adoption/surrogacy and wellness benefits, and employee/family assistance plans. We also offer eligible employees various retirement savings plans (including pension and a global share ownership plan with employer matching contributions) and financial education and counseling resources. Our generous paid time off program in Canada includes holidays, vacation, personal, and sick days, and we offer the full range of statutory leaves of absence. If you are applying for this role in the U.S., please contact [email protected] for more information about U.S.-specific paid time off provisions.

Top Skills

Adx
AWS
Azure
Bash
Docker
GCP
Grafana
Kubernetes
New Relic
Power BI
Python
HQ

Manulife Toronto, Ontario, CAN Office

250 Bloor St E,, Toronto, Ontario, Canada, M4W 1E6

Manulife Kitchener, Ontario, CAN Office

25 Water St S, Kitchener, ON, Canada, N2G 4Z4

Manulife Waterloo, Ontario, CAN Office

500 King St N,, Waterloo, ON, Canada, N2L

Similar Jobs

Yesterday
Easy Apply
Hybrid
Toronto, ON, CAN
Easy Apply
Mid level
Mid level
Cloud • Mobile • Software
Improve and protect production reliability and performance by implementing SRE practices (SLIs/SLOs, error budgets), building observability, evolving AWS infrastructure with Terraform, contributing automation and code, participating in incident response, and documenting runbooks and standards across teams.
Top Skills: Python,Node.Js,Typescript,Aws,Terraform,Docker,Ecs,Eks,Kubernetes,Datadog,Prometheus,Grafana,Honeycomb,New Relic,Incident.Io,Pagerduty,Opsgenie,Llms
7 Days Ago
Easy Apply
Hybrid
Toronto, ON, CAN
Easy Apply
Senior level
Senior level
Artificial Intelligence • Marketing Tech • Software
Lead technical reliability initiatives across a multi-cloud, multi-region active-active content platform. Architect and evolve core services, observability and logging, automation and capacity planning. Mentor engineers, drive cross-team reliability projects, define standards (IaC, SLOs, on-call) and proactively improve platform scalability and incident outcomes.
Top Skills: Apache Pulsar,Apache Kafka,Grafana Loki,Scylladb,Cassandra,Prometheus,Thanos,Grafana Alloy,Tempo,Terraform,Chef,Eks,Gke,Kubernetes,Nodejs,Golang,Ruby,Python,Shell Scripting,Linux,Aws,Gcp
8 Days Ago
Easy Apply
Hybrid
Toronto, ON, CAN
Easy Apply
Mid level
Mid level
Artificial Intelligence • Cloud • Information Technology • Machine Learning • Software • Big Data Analytics • Automation
As a Site Reliability Engineer II, you'll enhance PagerDuty's infrastructure, ensuring reliability and scalability while monitoring system health and participating in on-call rotations.
Top Skills: AWSAzureCloudFormationDatadogGCPGoGrafanaKubernetesLinuxNew RelicPrometheusPythonRubySplunkSumologicTerraform

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account