Focal Systems Logo

Focal Systems

Senior Site Reliability Engineer

Posted 18 Days Ago
Be an Early Applicant
Toronto, ON
Senior level
Toronto, ON
Senior level
As a Senior Site Reliability Engineer at Focal Systems, you will ensure smooth operation and continuous improvement of infrastructure and system reliability. Responsibilities include managing deployments, operating Kubernetes clusters, collaborating with various teams, planning infrastructure architecture, and leading uptime improvement processes.
The summary above was generated by AI

Location: Toronto, Canada - Remote
Salary: $170-180k CAD + stock 


Company Description

Focal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar retail using deep learning computer vision. Focal Systems has been deployed at scale with the top retailers in the world. We are looking for smart, creative and passionate people who want to help build a great and enduring company and deploy Deep Learning to the world! 


Mission of the role:
To enable us to scale from 200k to 1 million cameras


Job Summary

As a Sr. DevOps/Site Reliability Engineer (SRE) at our company, you will play a pivotal role in ensuring the smooth operation and continuous improvement of our infrastructure, deployment processes, and overall system reliability.



Responsibilities

  • Set up and manage blue/green and canary deployments to ensure smooth launches without downtime.
  • Operate multiple large GCP Kubernetes clusters and fine tune for reliability vs cost
  • Manage the various distributed services of the company, ensuring to always provide graceful updates, comprehensive test coverage, tracking of logs, and 99.9% uptime
  • Work with Backend, Frontend and Deep Learning teams and write infrastructure automation code for their needs
  • Identify scalability bottlenecks through load testing and plan infrastructure architecture
  • Create tools to provide transparency/ease of access into the company's rich datasets stored across varying geographic locations and data formats
  • Design, build, and manage a robust Continuous Integration and Continuous Deployment (CI/CD) pipeline.
  • Lead uptime improvement processes including: postmortem review, on-call setup.

Requirements 

  • Solid experience in an infrastructure or Site Reliability Engineer (SRE)  role 
  • Hands-on experience with containerization (Docker) and orchestration platforms (Kubernetes) required
  • Experience in cloud cost management
  • Great understanding of SQL, networking, distributed systems, operating systems (debian) and software engineering practices
  • Experience with messaging systems
  • Terraform or other Infrastructure as Code automation solution
  • Operating Relational SQL databases and Redis at terabyte scale. 
  • Proven experience with setting up monitoring/alerting and reliability engineering
  • Scriptings skills in Python
  • Must be comfortable with 12-hour on call rotations
  • Must have experience working at a SaaS company 

Nice to have experience:

  • GitOps 
  • Setting up automation for complex load testing scenarios
  • Tuning Deep Learning pipelines with Python, Pytorch and Multiprocessing
  • Backend programming with Python

Why Focal Systems

Strong Values and Mission - We are a tightly-knit team with an ambitious mission and a strong set of core values, which define our approach to business and have successfully guided us since inception.
Exceptional Team - We are a team of hard-working, fun-loving professionals from some of the most eminent universities, research labs, and tech companies of our time. We pride ourselves on recruiting exceptional individuals to help us redefine the state-of-the-art.
Outstanding Partners - We work with 10+ of the largest retailers in the world and have a world-class roster of investors, advisors and partners to support & advise us in our endeavors.


Benefits

We care deeply about the health, happiness, and wellbeing of all of our employees. We offer:

  • Competitive Salary & Attractive Stock
  • Paid Time Off 
  • Quarterly Team Retreats
  • Education grants


Top Skills

Python

Similar Jobs

Be an Early Applicant
17 Hours Ago
Waterloo, ON, CAN
Hybrid
1,902 Employees
Senior level
1,902 Employees
Senior level
Fintech • Software
As a Senior Site Reliability Engineer at Carta, you will build and scale internal platforms, design monitoring systems, and collaborate with software engineers to ensure application reliability and performance. You will also drive improvements in global infrastructure systems.
Be an Early Applicant
5 Days Ago
Toronto, ON, CAN
220 Employees
Senior level
220 Employees
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Productivity • Software
As a Senior Site Reliability Engineer at Nylas, you will support the engineering team by provisioning infrastructure, maintaining a legacy system in AWS and modern systems in GCP, configuring alerts, managing CI/CD pipelines, and participating in on-call rotations while ensuring product reliability.
Be an Early Applicant
5 Days Ago
West Toronto, ON, CAN
88,000 Employees
Mid level
88,000 Employees
Mid level
Fintech • Insurance
The Senior Site Reliability Engineer will lead a team to design and support scalable cloud platform architectures, focusing on automation, incident management, and infrastructure best practices. The role involves writing scripts for infrastructure automation, participating in on-call support, and reviewing changes to ensure platform reliability.

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account