Lead the Site Reliability Engineering function, ensuring platform reliability, uptime, and performance. Architect scalable infrastructure and mentor engineers.
About the Job & Shakudo
At Shakudo, we are building the world’s first operating system for data and AI. We use the term operating system in the truest sense of the word. Like iOS, Windows and Linux, Shakudo’s end-to-end OS offers ever-evolving, automatically operated, best-of-breed open-source components tailored to each business's unique needs.
The Role
We are hiring a Head of Site Reliability Engineering to lead the reliability, availability, and performance strategy of our platform. This role is ideal for someone who thrives on solving infrastructure challenges, scaling cloud-native systems, and building high-performance teams.You will work cross-functionally with engineering, product, and customer success to make Shakudo’s platform rock-solid and resilient for our customers around the world.
What You’ll Do
- Build and lead the SRE function at Shakudo, setting goals, technical direction, and driving team culture
- Own uptime, reliability, and incident response for our platform
- Architect scalable infrastructure using Kubernetes, cloud-native tooling, and automation frameworks
- Lead the design of observability, monitoring, and alerting systems to proactively detect and prevent issues
- Create and enforce best practices for CI/CD, disaster recovery, and service-level objectives (SLOs)
- Partner closely with engineering and product to ensure new features are reliable and production-ready
- Mentor engineers and help instill a culture of operational excellence
What We're Looking For
- 8+ years of experience in infrastructure, DevOps, or SRE roles with increasing responsibility
- Proven experience scaling distributed systems in a high-availability, production environment
- Expertise with Kubernetes, Terraform, containerization, and at least one major cloud provider (AWS preferred)
- Strong knowledge of system design, networking, and reliability principles
- Experience with observability tools (e.g., Prometheus, Grafana, Datadog) and incident response practices
- Strong leadership and communication skills, with a hands-on, collaborative approach
Nice to Have
- Experience supporting data pipelines, ML workloads, or complex orchestration systems
- Familiarity with the data/ML tooling ecosystem (e.g., Airflow, dbt, Spark, Dremio, etc.)
- Previous experience in a startup or high-growth environment
Shakudo is an equal opportunity employer and encourages candidates of all backgrounds to apply. We foster diversity and inclusivity and welcome applications from a broad range of backgrounds and experiences.
Top Skills
AWS
Datadog
Grafana
Kubernetes
Prometheus
Terraform
Shakudo Toronto, Ontario, CAN Office
21 Carscadden Dr, Toronto, Ontario, Canada, M2R 2A6
Similar Jobs
Food • Retail • Agriculture • Manufacturing
Lead AI-driven marketing transformation by managing product deployments, engaging with teams, and ensuring best practices for AI adoption and ROI.
Top Skills:
AdtechAIAnalyticsCdpCRMMarketing AutomationMartechMlNlpPredictive Analytics
Food • Retail • Agriculture • Manufacturing
The Sr. Analyst supports the Revenue Management team through financial analysis, modeling, and reporting, focusing on revenue growth initiatives and collaborating with cross-functional teams.
Top Skills:
Circana Supply TrackExcelMicrosoft PowerpointOnestreamPower BISAP
Food • Retail • Agriculture • Manufacturing
The Director of Digital Technology Operations manages operational activities, leads strategic projects, prepares executive communications, and fosters stakeholder engagement to ensure digital technology excellence.
Top Skills:
ExcelPowerPoint
What you need to know about the Toronto Tech Scene
Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.
