Bobsled Logo

Bobsled

Site Reliability Engineer

Posted 4 Days Ago
Easy Apply
Remote
Hiring Remotely in Canada
Senior level
Easy Apply
Remote
Hiring Remotely in Canada
Senior level
The DevSecOps Engineer will drive security, reliability, and operational excellence for Bobsled's data-sharing platform, integrating security best practices into CI/CD, managing multi-cloud security, and ensuring compliance.
The summary above was generated by AI

About Bobsled

Bobsled is building AI-powered analytics experiences that turn natural language into accurate, production-grade insights. Our mission is to enable enterprise customers to leverage the full power of AI and data agents, transforming how they access and act on their data. As we scale our AI product, we’re seeking hands-on specialists to ensure our customers’ deployments are robust, contextually tuned, and delivering measurable value.

The Role

We are looking for an experienced Site Reliability Engineer to drive the reliability, scalability, and operational excellence of Bobsled's data-sharing platform. You'll apply your expertise to complex technical and business challenges, ensuring that our infrastructure and pipelines are highly available and performant.

Please note: This role is open exclusively to candidates located in the Central Time (CT) or Eastern Time (ET) zones in the USA or Canada, as you will be working closely with European engineers.

You will play a key role in maintaining and improving Bobsled's multi-cloud environment (GCP, AWS, Azure, Cloudflare, Snowflake, Databricks). Your work will have a direct and massive impact on the way organizations share and collaborate on data across the world.

As an early hire, you will also play a pivotal role in shaping our team culture, fostering a collaborative environment, and assessing engineering candidates.

Key Responsibilities
  • Infrastructure Reliability: Design, build, and maintain highly available, scalable infrastructure using modern IaC practices such as Terraform/Pulumi.
  • Multi-Cloud Operations: Manage and optimize Bobsled's infrastructure across GCP, AWS, Azure, and other cloud providers.
  • CI/CD Pipelines: Build and maintain robust pipelines that ensure safe, reliable, and automated deployment of infrastructure and applications.
  • Monitoring & Observability: Develop comprehensive monitoring, logging, and alerting systems to ensure visibility into infrastructure and application health.
  • Incident Response: Establish and continuously improve incident response processes, ensuring rapid detection and resolution of production issues.
  • Performance Optimization: Identify and resolve performance bottlenecks, capacity planning, and cost optimization across our cloud environments.
  • On-Call & Reliability: Participate in on-call rotations and drive improvements to reduce toil and improve system reliability.
Preferred Qualifications
  • 8+ years of experience in SRE, DevOps, or Platform Engineering, managing distributed cloud-native systems in production.
  • Proficiency in Infrastructure as Code (IaC) tools like Terraform/Pulumi.
  • Experience with TypeScript or other modern programming languages (our stack is heavily TypeScript-based).
  • Strong background in cloud platforms (GCP, AWS, Azure) - hands-on experience with at least one is required.
  • Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, etc).
  • Understanding of CI/CD best practices and experience with pipeline tools like Github Actions.
  • Strong troubleshooting skills and experience with incident management.
Nice To Have
  • Experience with cloud security solutions, IAM, secrets management (HashiCorp Vault, GCP Secrets Manager), Identity based Authentication, Zero Trust
  • Knowledge of security compliance frameworks (SOC 2, ISO 27001).
  • Experience with Kubernetes, serverless architectures, or container security.
  • Exposure to data and data platforms, e.g. Snowflake, Databricks and Spark engines like AWS EMR and GCP Dataproc
Compensation & Benefits
  • Competitive Salary and Equity
  • Health Insurance: Medical (100% paid), dental, and vision benefits for you and your family
  • Generous PTO policy and paid parental leave
  • Fully upgraded Apple MacBook and 4K monitor (for engineering team only)
  • Home office stipend of $1,000
  • Flexible work hours in a fully remote work environment
  • Fully sponsored individual coaching for all employees to help foster a culture of personal reflection and growth (optional but encouraged)
Why Join Us?

We know that no candidate is perfectly qualified for any job. Experience comes in different forms, and many skills are transferable. More important than your resume is a clear demonstration of skill, dedication, and the ability to thrive in a collaborative environment.

If you're passionate about operational excellence and building reliable, scalable systems, we'd love to hear from you!

-Remote

Top Skills

AWS
Azure
Ci/Cd
Docker
GCP
Hashicorp Vault
Kubernetes
Oci
Terraform
Typescript

Similar Jobs

2 Days Ago
In-Office or Remote
8 Locations
Senior level
Senior level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Embedded Site Reliability Engineer will develop and maintain software applications for Bitcoin mining, focusing on embedded systems and cloud observability. Responsibilities include software testing, bug triage, and collaboration with engineering teams to optimize performance and reliability.
Top Skills: CC++DatadogElasticGoGrafanaJavaScriptLinuxPythonRustSplunkSQLTypescript
7 Days Ago
Remote
Canada
Junior
Junior
Information Technology • Marketing Tech • Social Media
As a Site Reliability Engineer at GoDaddy, you'll automate and maintain storage infrastructure, focusing on Ceph, while ensuring system reliability and performance. You'll develop tools, monitor systems, and enhance operations.
Top Skills: AnsibleAWSBashCephDockerGrafanaIcinga2KubernetesLokiMimirNagiosOpenstackPrometheusPythonSaltstackTerraform
8 Days Ago
Easy Apply
Remote
Canada
Easy Apply
Senior level
Senior level
Artificial Intelligence • Software
As a Site Reliability Engineer, you will ensure the performance, availability, and reliability of production systems, working with DevOps and engineering teams on infrastructure and automation tasks.
Top Skills: AnsibleAWSCloud FunctionsConsulGCPGitlabGoGoogle Cloud DataflowJavaJenkinsKubernetesNomadPub/SubPythonSaltstackTerraform

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account