Citi Logo

Citi

Site Reliability Engineer

Reposted Yesterday
Be an Early Applicant
In-Office
Mississauga, ON, CAN
Senior level
In-Office
Mississauga, ON, CAN
Senior level
The Site Reliability Engineer supports AI and DevOps platforms, resolving incidents, enhancing stability, and collaborating with engineering teams to improve platform support and operational efficiency.
The summary above was generated by AI
We are seeking a motivated team member to support our AI and DevOps Platform Support team in North America. This role is responsible for assisting in the stability, reliability, and performance of our critical AI and DevOps platforms. The team supports a wide range of services, including multiple AI applications, developer tools, and CI/CD pipeline technologies used across the organization. The ideal candidate will work closely with SRE and Support engineers to resolve incidents, address platform issues, and collaborate with engineering and development teams to enhance platform supportability. The role includes coordinating daily operational activities and contributing to short‑term planning.Responsibilities• Understand how application support functions within the broader technology organization and contributes to business objectives.
• Assist with vendor coordination and day‑to‑day interactions with offshore managed services.
• Support efforts to improve service levels, including participating in incident management, problem management, and knowledge‑sharing initiatives.
• Partner with development and engineering teams to support application stability and operational readiness.
• Assist in collecting capacity, performance, and latency data to support platform planning efforts.
• Support application onboarding activities using established guidelines and standards.
• Contribute to fostering a collaborative and supportive team environment that encourages skill development.
• Participate in cost‑efficiency initiatives such as Root Cause Analysis reviews, knowledge management, and performance tuning.
• Assist in preparing materials for business review meetings and help align technology activities with business needs.
• Follow established support processes and tool standards and provide input on improvement opportunities.
• Perform other duties and functions as assigned.
• Contribute to platform enhancement initiatives in partnership with engineering and support leads.
• Assist in resilience‑related activities, including incident simulations, disaster recovery exercises, and platform readiness testing.
• Support automation efforts to reduce manual tasks and improve operational efficiency.
• Help maintain observability practices, including monitoring, logging, tracing, and alerting.
• Maintain practical understanding of platform components to support troubleshooting and incident response activities.
• Assist in tracking the operational health of production platforms (including OpenShift, ECS, CI/CD) and support SLA adherence.
• Participate in monitoring and observability efforts to support proactive issue identification and analysis.
Qualifications• 5–8 years of relevant experience in technical support, platform operations, or engineering.
• Exposure to architecture concepts with the ability to contribute to technical discussions and understand design decisions.
• Experience working with business partners, engineering teams, or technology stakeholders.
• Demonstrated experience supporting IT services, platform operations, or infrastructure components.
• Strong verbal and written communication skills, with the ability to document technical issues clearly.
• Experience supporting operational workstreams or participating in platform improvement initiatives.
• Participation in resilience‑related or stability‑focused activities preferred.
• Ability to collaborate effectively with cross‑functional teams.
• Strong organizational skills and ability to manage daily workload and task priorities.
• Working knowledge of Generative AI concepts preferred.
• Experience with CI/CD or configuration management tools preferred.
• Experience with Red Hat OpenShift or similar Kubernetes technologies preferred.
• Experience working with databases such as Postgres, Oracle, MongoDB, or Redis preferred.
• Experience with scripting or coding in Java, Python, Go, or similar languages preferred.
• Familiarity with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK) preferred.
Education• Bachelor’s/University degree required.

------------------------------------------------------

Job Family Group:

Technology

------------------------------------------------------

Job Family:

Applications Support

------------------------------------------------------

Time Type:

Full time

------------------------------------------------------

Primary Location Full Time Salary Range:

$94,300.00 - $141,500.00

------------------------------------------------------

Most Relevant Skills

Please see the requirements listed above.

------------------------------------------------------

Other Relevant Skills

For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------

Automated Processing and AI

We use automated processing, including artificial intelligence, for our legitimate business interests (or our reasonable and appropriate business purposes) to identify and align the candidate's skills and abilities with a specific job opening. Additionally, if you so choose, or consent, we can match your skills and abilities to other suitable roles at Citi.

Importantly, all our hiring processes and decisions, including determining your suitability for a role, are conducted, checked, and decided by individuals. Our automated processing and AI do not involve relying on automatic or autonomous decision-making. Please refer to any Jurisdictional Considerations, with specific provisions for your country (where relevant) for further details.

------------------------------------------------------

This job opening is for an existing job vacancy.

------------------------------------------------------

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

 

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.

Top Skills

AI
Ci/Cd
DevOps
Ecs
Elk
Go
Grafana
Java
MongoDB
Openshift
Oracle
Postgres
Prometheus
Python
Redis
Splunk

Similar Jobs

8 Days Ago
Hybrid
Toronto, ON, CAN
Mid level
Mid level
Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software
As a Site Reliability Engineer for Data Services at iManage, you will ensure data durability, optimize performance, manage upgrades, and collaborate across teams, focusing on reliability and sustainability in cloud-native environments.
Top Skills: AzureBashDockerEfkElasticsearchElkGoGrafanaHashicorp TerraformJavaKubernetesMariadbMaxscalePowershellPrometheusPythonRuby
2 Days Ago
In-Office
Toronto, ON, CAN
Junior
Junior
Software
As a Site Reliability Engineer, you'll improve an Azure-based SaaS platform by ensuring operational excellence, handling client connectivity, and automating network management.
Top Skills: Application Security GroupsAzureBashGitNetwork Security GroupsPowershellPrivate LinksVnet PeeringVpn
4 Days Ago
Easy Apply
Hybrid
Toronto, ON, CAN
Easy Apply
Mid level
Mid level
Cloud • Mobile • Software
Improve and protect production reliability and performance by implementing SRE practices (SLIs/SLOs, error budgets), building observability, evolving AWS infrastructure with Terraform, contributing automation and code, participating in incident response, and documenting runbooks and standards across teams.
Top Skills: AWSDatadogDockerEcsEksGrafanaHoneycombIncident.IoKubernetesLlmsNew RelicNode.jsOpsgeniePagerdutyPrometheusPythonTerraformTypescript

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account