Bank of Montreal Logo

Bank of Montreal

Site Reliability Engineer

Posted 3 Days Ago
Be an Early Applicant
Toronto, ON
Senior level
Toronto, ON
Senior level
The Site Reliability Engineer monitors and supports critical systems, ensuring service restoration and incident management, while collaborating with development teams to enhance monitoring, performance, and reliability of software solutions. They are responsible for operational support, deployment, and implementing improvements across the infrastructure.
The summary above was generated by AI

Application Deadline:

03/27/2025

Address:

4100 Gordon Baker Road

Job Family Group:

Technology

Monitors, restores service, changes, supports, and handles day-to-day activities 7/24/365 required to run the mission critical systems for the company ensuring business service levels are met and environments are managed. Monitors and ensures service restoration of infrastructure, applications (online and offline), and security, while meeting service level agreements. Provides the Help/Service Desk support, as well as coordinates and facilitates Incident Management, deploys changes to the production environment, and engages 3rd party providers contracted to the Bank during an incident. Provides immediate response to production program or system problems. Participates in testing cycles to ensure the ability to deploy and operability of infrastructure and applications. Deploys, implements, provisions applications and infrastructure per deployment plans and infrastructure build guides.

  • Works with development teams to build solutions that use enterprise monitoring/logging, are “self-healing”, and require minimal to zero maintenance.
  • Build software and systems to manage platform infrastructure and applications
  • Improve reliability, quality, and time-to-market of our suite of software solutions
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
  • Provide primary operational support and engineering for multiple large-scale distributed software applications
  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Develop and provide operational support for full-stack software applications.
  • Possesses strong technical and/or business functional knowledge of systems, timing and dependencies.
  • Conducts independent analysis and assessment to resolve strategic issues.
  • Monitors and tracks performance, and addresses any issues.
  • Breaks down strategic problems, and analyses data and information to provide insights and recommendations.
  • Designs/engineers system-operating capacity (such as bandwidth, disk space, storage, and CPU utilization) to ensure high availability and performance of end-user applications and systems.
  • Builds effective relationships with internal/external stakeholders.
  • Leads/Conducts incident recovery and detailed root case analysis.
  • Deploys manual code to production environment.
  • Facilitates or completes analysis, design and configuration of viable solutions to highly complex technology problems that would improve data center and support activities.
  • Provides end to end technology support including computer, applications, network and storage, and root-cause analysis etc.
  • Drives and/or promotes new processes, systems, technology, and operations and expanded capabilities for performance, with the flexibility to align to the unique requirements of the project teams and deliverables.
  • Proactively monitors system performance and identifies operational improvements, in ensuring smooth and consistent customer and business partner delivery.
  • Supports deployment activities, managing implementation issues to resolution.
  • Provides initial triage, investigation and ensures fast turnaround times on issue/incident resolution.
  • Monitors technical infrastructure, applications and/or business transactions through automated systems and instrumentation across the environment.
  • Provides inbound call assistance to end-users for application, technical, and IMACD needs leveraging the knowledge base and/or run books available.
  • Collaborates and engages with the appropriate areas across the bank.
  • Develops or helps to develop the knowledge assets required for the operation.
  • Promotes adherence to standards and industry best practices.
  • Develops an understanding of organizational interactions and complexity to engage with the appropriate matrix areas.
  • Identifies opportunities to strengthen the operational capability, such as: sharing expertise to promote technical development, mentoring employees, building communities of practice and networks across technology.
  • Stays abreast of industry technical and business trends through participation in professional associations, practice communities & individual learning.
  • Focus is primarily on business/group within BMO; may have broader, enterprise-wide focus.
  • Exercises judgment to identify, diagnose, and solve problems within given rules.
  • Works independently on a range of complex tasks, which may include unique situations.
  • Broader work or accountabilities may be assigned as needed.

    Qualifications:

  • Masters degree (or equivalent) in computer science or related discipline with 7-10 years of experience
  • Fluent with OS scripting like UNIX, Bash
  • Experience with Cloud deployments and Operations (AWS & Azure)
  • Experience with Observability Tools(Dynatrace, Cloudwatch)
  • Fluent with CI/CD and Automation Tools like Ansible, JIRA
  • Fluent with Source Code Management - GitHub
  • Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
  • Experience with Cloud deployments and Operations (AWS & Azure)
  • Experience with REST API's
  • Understanding of Information Technology operating processes used for systems to ensure effective delivery including but not limited to IT Operations mandatory operating standards for monitoring, logging, and alerting.
  • Knowledge of support and operations practice, concepts, and technology obtained through formal training and/or work experience.
  • Technical and/or business functional knowledge of systems, tools, timing, and dependencies.
  • Technical proficiency gained through education and/or business experience.
  • Verbal & written communication skills - In-depth.
  • Collaboration & team skills - In-depth.
  • Analytical and problem solving skills - In-depth.
  • Influence skills - In-depth.
  • Data driven decision making - In-depth.
  • Preferred skills and qualifications

  • Previous success in site reliability engineering
  • Coding experience beyond simple scripts
  • Broader work or accountabilities may be assigned as needed.

Salary:

$60,000.00 - $111,700.00

Pay Type:

Salaried

The above represents BMO Financial Group’s pay range and type.

Salaries will vary based on factors such as location, skills, experience, education, and qualifications for the role, and may include a commission structure. Salaries for part-time roles will be pro-rated based on number of hours regularly worked. For commission roles, the salary listed above represents BMO Financial Group’s expected target for the first year in this position.

BMO Financial Group’s total compensation package will vary based on the pay type of the position and may include performance-based incentives, discretionary bonuses, as well as other perks and rewards. BMO also offers health insurance, tuition reimbursement, accident and life insurance, and retirement savings plans. To view more details of our benefits, please visit: https://jobs.bmo.com/global/en/Total-Rewards

About Us

At BMO we are driven by a shared Purpose: Boldly Grow the Good in business and life. It calls on us to create lasting, positive change for our customers, our communities and our people. By working together, innovating and pushing boundaries, we transform lives and businesses, and power economic growth around the world.

As a member of the BMO team you are valued, respected and heard, and you have more ways to grow and make an impact. We strive to help you make an impact from day one – for yourself and our customers. We’ll support you with the tools and resources you need to reach new milestones, as you help our customers reach theirs. From in-depth training and coaching, to manager support and network-building opportunities, we’ll help you gain valuable experience, and broaden your skillset.

To find out more visit us at https://jobs.bmo.com/ca/en.

BMO is committed to an inclusive, equitable and accessible workplace. By learning from each other’s differences, we gain strength through our people and our perspectives. Accommodations are available on request for candidates taking part in all aspects of the selection process. To request accommodation, please contact your recruiter.

Note to Recruiters: BMO does not accept unsolicited resumes from any source other than directly from a candidate. Any unsolicited resumes sent to BMO, directly or indirectly, will be considered BMO property. BMO will not pay a fee for any placement resulting from the receipt of an unsolicited resume. A recruiting agency must first have a valid, written and fully executed agency agreement contract for service to submit resumes.

Top Skills

Amazon S3
Ansible
Apache Mesos
AWS
Azure
Bash
Ceph
Cloudwatch
Dynatrace
Git
Hdfs
JIRA
Kubernetes
Nfs
Rest Apis
Unix
Yarn
HQ

Bank of Montreal Toronto, Ontario, CAN Office

First Canadian Place, 100 King Street, Toronto, Ontario, Canada, M5X 1A1

Similar Jobs

3 Days Ago
Toronto, ON, CAN
Mid level
Mid level
Enterprise Web • Fintech • Financial Services
As a Site Reliability Engineer, you will ensure the reliability and performance of cloud-based infrastructure, working closely with development and operations teams to automate processes and improve observability. Key responsibilities include implementing observability platforms, managing incidents, and using Infrastructure as Code with tools like Terraform and Kubernetes. Your role will also focus on building resilient systems and collaborating to ensure operational excellence.
Top Skills: AWSBashCdkCi/CdCloud-Native InfrastructureCloudFormationContainersDatadogGitLinuxMonitoring ToolsNew RelicPythonSplunkTerraform
An Hour Ago
Toronto, ON, CAN
Senior level
Senior level
Food • Retail • Agriculture • Manufacturing
The Sr Engineering Manager, SRE & Observability will lead the design, implementation, and monitoring of secure, fault-tolerant SRE and Observability infrastructure. Responsibilities include developing strategies, collaborating with teams, mentoring engineers, and driving operational excellence through advanced monitoring and automation techniques.
4 Days Ago
6 Locations
Junior
Junior
Artificial Intelligence • Digital Media • Marketing Tech • Software
As a Site Reliability Engineer, you will develop, deploy, and maintain Kubernetes-based infrastructure while contributing to the design of cloud-native applications. Responsibilities include operational tasks, CI/CD implementation, monitoring the platform, and collaborating with teams to resolve issues.
Top Skills: ArgocdAWSAzureGitopsGoGrafanaHelmKubernetesNode.jsPrometheusPython

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account