Citi Logo

Citi

Site Reliability Engineer

Posted 13 Days Ago
Be an Early Applicant
In-Office
Mississauga, ON, CAN
Senior level
In-Office
Mississauga, ON, CAN
Senior level
The Site Reliability Engineer will enhance the stability and performance of AI and DevOps platforms, support operational activities, assist in incident management, and collaborate with various teams to improve platform supportability.
The summary above was generated by AI
Description
Engineer the future of global finance.
At Citi, our Tech team doesn’t just support finance – we are helping to redefine it. Every day, $5 trillion crosses through our network. We do business in 180+ countries, operating at a scale few can match. From deploying advanced AI to helping shape global markets, we build systems that matter. Look to join a team where your work helps influence economies, your ideas can drive innovation and outcomes, and your growth is backed by mentorship, continuous learning, and flexibility with potential hybrid work opportunities. Help solve real‑world challenges that touch millions and get the opportunity to build the future of finance with Citi Tech.
We are seeking a motivated team member to support our AI and DevOps Platform Support team in North America. This role is responsible for assisting in the stability, reliability, and performance of our critical AI and DevOps platforms. The team supports a wide range of services, including multiple AI applications, developer tools, and CI/CD pipeline technologies used across the organization. The ideal candidate will work closely with SRE and Support engineers to resolve incidents, address platform issues, and collaborate with engineering and development teams to enhance platform supportability. The role includes coordinating daily operational activities and contributing to short‑term planning.Responsibilities• Understand how application support functions within the broader technology organization and contributes to business objectives.
• Assist with vendor coordination and day‑to‑day interactions with offshore managed services.
• Support efforts to improve service levels, including participating in incident management, problem management, and knowledge‑sharing initiatives.
• Partner with development and engineering teams to support application stability and operational readiness.
• Assist in collecting capacity, performance, and latency data to support platform planning efforts.
• Support application onboarding activities using established guidelines and standards.
• Contribute to fostering a collaborative and supportive team environment that encourages skill development.
• Participate in cost‑efficiency initiatives such as Root Cause Analysis reviews, knowledge management, and performance tuning.
• Assist in preparing materials for business review meetings and help align technology activities with business needs.
• Follow established support processes and tool standards and provide input on improvement opportunities.
• Perform other duties and functions as assigned.
• Contribute to platform enhancement initiatives in partnership with engineering and support leads.
• Assist in resilience‑related activities, including incident simulations, disaster recovery exercises, and platform readiness testing.
• Support automation efforts to reduce manual tasks and improve operational efficiency.
• Help maintain observability practices, including monitoring, logging, tracing, and alerting.
• Maintain practical understanding of platform components to support troubleshooting and incident response activities.
• Assist in tracking the operational health of production platforms (including OpenShift, ECS, CI/CD) and support SLA adherence.
• Participate in monitoring and observability efforts to support proactive issue identification and analysis.
Qualifications• 5–8 years of relevant experience in technical support, platform operations, or engineering.
• Exposure to architecture concepts with the ability to contribute to technical discussions and understand design decisions.
• Experience working with business partners, engineering teams, or technology stakeholders.
• Demonstrated experience supporting IT services, platform operations, or infrastructure components.
• Strong verbal and written communication skills, with the ability to document technical issues clearly.
• Experience supporting operational workstreams or participating in platform improvement initiatives.
• Participation in resilience‑related or stability‑focused activities preferred.
• Ability to collaborate effectively with cross‑functional teams.
• Strong organizational skills and ability to manage daily workload and task priorities.
• Working knowledge of Generative AI concepts preferred.
• Experience with CI/CD or configuration management tools preferred.
• Experience with Red Hat OpenShift or similar Kubernetes technologies preferred.
• Experience working with databases such as Postgres, Oracle, MongoDB, or Redis preferred.
• Experience with scripting or coding in Java, Python, Go, or similar languages preferred.
• Familiarity with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK) preferred.
Education• Bachelor’s/University degree required.

------------------------------------------------------

Job Family Group: Technology

------------------------------------------------------

Job Family:Applications Support

------------------------------------------------------

Time Type:Full time

------------------------------------------------------

Primary Location Full Time Salary Range:$94,300.00 - $141,500.00

------------------------------------------------------

Most Relevant Skills Please see the requirements listed above.

------------------------------------------------------

Other Relevant Skills For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------

Automated Processing and AI

We use automated processing, including artificial intelligence, for our legitimate business interests (or our reasonable and appropriate business purposes) to identify and align the candidate's skills and abilities with a specific job opening. Additionally, if you so choose, or consent, we can match your skills and abilities to other suitable roles at Citi.

Importantly, all our hiring processes and decisions, including determining your suitability for a role, are conducted, checked, and decided by individuals. Our automated processing and AI do not involve relying on automatic or autonomous decision-making. Please refer to any Jurisdictional Considerations, with specific provisions for your country (where relevant) for further details.

------------------------------------------------------

This job opening is for an existing job vacancy.

------------------------------------------------------

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

 

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.

Similar Jobs

19 Days Ago
Remote or Hybrid
Canada
Mid level
Mid level
Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
The Site Reliability Engineer will ensure system reliability and scalability, manage infrastructure, automate tasks, and collaborate cross-functionally while mentoring junior engineers and supporting production environments.
Top Skills: AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript
2 Days Ago
Hybrid
Toronto, ON, CAN
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software
The Senior Site Reliability Engineer at iManage will build and maintain resilient platforms, drive reliability best practices, and support cloud infrastructure with automation and scalable solutions.
Top Skills: AksAzureBashChefDockerEfkElkGoGrafanaJavaKubernetesLinuxPowershellPrometheusPythonRubyTerraform
3 Days Ago
In-Office
Toronto, ON, CAN
Mid level
Mid level
Payments
As a Site Reliability Engineer, you will enhance system reliability, drive incident responses, design tools for automation, and participate in on-call rotations.
Top Skills: AWSAzureBashGCPNetOpentelemetryPowershellPrometheusPythonTypescript

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account