About the Role:
We are looking for a Site Reliability Engineer to ensure the reliability, scalability, and performance of our cloud-based infrastructure. You will work closely with development and operations teams to build resilient systems, automate operations, and improve observability. In order to deliver business value in a rapid way, you will also adopt the cutting-edge concepts such as data lake, data analytics, distributed data calculation, continuous integration and continuous deployment to fuel our data development process. Moreover, you will need to leverage engineering skills and operational insights, to establish and advocate the high standards of operational excellence, collaborate with diverse teams to contribute to initiatives that bring data products and services operations to the next level.
This position is based in our Toronto office, and we follow a hybrid policy of at least 3 days onsite allowing for 2 days remote work.
Key Responsibilities
- Implement observability platforms using industry best practices
- Establish and measure Service Level Objectives (SLOs) and Error Budgets
- Create automated remediation processes to reduce manual intervention
- Manage incidents, conduct postmortems, and implement preventative solutions
- Deploy Infrastructure as Code (IaC) using Terraform, CDK, and CloudFormation
- Manage containerized applications with Docker and Kubernetes (AWS ECS/EKS preferred)
- Apply security best practices to infrastructure monitoring
- Build fault-tolerant infrastructure and support disaster recovery
- Maintain monitoring tools like Splunk and New Relic
- Collaborate across engineering teams to improve system resilience
Required Qualifications:
- Bachelor's degree in Computer Science, Engineering, or related field.
- 3+ years in Site Reliability Engineering, DevOps, or Observability Engineering
- Proficiency with cloud-native infrastructure, containers, and Linux
- Experience with monitoring tools (Splunk, New Relic, Datadog)
- Strong automation skills using Python, Bash, and IaC
- AWS expertise (EC2, S3, ECS/EKS, RDS, Lambda, VPC)
- Advanced troubleshooting capabilities for infrastructure and networking issues
- Experience defining reliability metrics (SLOs, SLIs, Error Budgets)
- Knowledge of log aggregation and anomaly detection
- Familiarity with CI/CD, Git, and DevOps practices
- Excellent communication and teamwork skills
Preferred Qualifications:
- AWS Associate Solution Architect or AWS Certified DevOps Engineer
- Experience managing large-scale observability platforms
- Advanced container and Kubernetes orchestration skills
- Experience with event-driven architectures (AWS Lambda, Kafka)
- Background in automated anomaly detection and self-healing systems
- Windows Administration
- SQL Query SkillSet
Morningstar's hybrid work environment gives you the opportunity to work remotely and collaborate in-person each week. We've found that we're at our best when we're purposely together on a regular basis, at least three days each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.
001_MstarInc Morningstar Inc. Legal Entity
Top Skills
Morningstar Toronto, Ontario, CAN Office
181 University Avenue, Toronto, ON, Canada, M5H 3M7
Morningstar Toronto, Ontario, CAN Office
1 Toronto Street, Toronto, Ontario, Canada, M5C 2W4