Collective[i]

Senior Site Reliability Engineer (Remote, Canada)

Sorry, this job was removed at 02:29 p.m. (EST) on Tuesday, Oct 29, 2024

Be an Early Applicant

Remote

Hiring Remotely in Canada

Internship

Remote

Hiring Remotely in Canada

Internship

At Collective[i], we value diversity of experience, knowledge, backgrounds and people who share a commitment to building a company and community on a mission to help people be more prosperous. We recruit extraordinary individuals and provide them the platform to contribute their exceptional talents and the freedom to work from wherever they choose. Our company is a wonderful place to learn and grow alongside an incredible and tenacious team.

Collective[i] was founded by three entrepreneurs with over $1B of prior exits. Their belief in the power of Artificial Intelligence to transform life as we know it and improve economic outcomes at massive scale drove the decision to invest over $100m in the company which has created a state-of-the-art platform for prosperity that helps companies generate sales and people expand their professional connections. In the last decade, Collective[i] has grown into a powerful community of scientists, engineers, creative talent and more, working together to help people succeed in business.

We are seeking a skilled and motivated professional to join our team as a Senior Site Reliability Engineer. If you have hands-on experience with AWS in roles such as Site Reliability Engineer (SRE), DevOps Engineer, Cloud Administrator, Platform Engineer, Systems Analyst, or Systems Engineer—or any related role where you’ve managed cloud infrastructure—this opportunity could be a great fit for you.

Responsibilities

Manage AWS infrastructure across multiple accounts using Terraform with extensive experience in deployment and automation.
Utilize Linux and open-source tooling as the foundation of your work, being proficient across various Linux distributions, scripting languages, clustering technologies, database engines, and configuration management tools, with a preference for Ansible.
Develop and implement containerization strategies, ensuring well-crafted container builds. Must be capable of creating original containers and not just relying on third-party containers from public repositories.
Assess and apply Kubernetes knowledge selectively, understanding when and why it is appropriate to use—note, we are not a Kubernetes-focused environment.
Collaborate closely with development teams, providing support in building and optimizing distributed systems.
Maintain expertise in Git workflows, including proficiency in CI/CD automation tools such as GitHub Actions.
Implement and manage monitoring and logging solutions, with hands-on experience in tools like DataDog and OpenTelemetry.
Strive to prevent issues like log diving, incident response, root cause analysis, and late-night pages by proactively managing system stability and reliability.

Requirements

Proficiency with AWS, Terraform, Packer, Ansible, and container technologies.
Expertise in AWS services
Experience with other cloud providers is a plus.
Strong knowledge of Ubuntu 24.04, Bash, Python, systemd, podman, docker, and auditd.
Familiarity with GitHub, GitHub Actions, GitHub Container Registry, and Copilot.
Experience with monitoring and logging tools like DataDog, OpenTelemetry, and Graylog.
Proficiency in working with databases and platforms such as Snowflake, Okta, Postgres, MongoDB, and ElasticSearch.
Familiarity with security tools like Snyk, Tenable.io, and 1Password.
Experience with SOC 2 or other compliance standards is highly desirable.

Who you are working for - About Collective[i]:

Collective[i] is on a mission to help people and companies prosper. Backed over 20 patents and developed by a team of world renowned entrepreneurs, engineers, scientists, and business leaders, Collective[i] is an Economic Foundation Model (“EFM”) that studies how the world does business. Collective[i]’s advisors include a world renowned economist, the former Vice Chair of the Federal Reserve, founders of Comcast, Instagram, MySQL, and former executives from Tesla, NewsCorp, USANetworks, and others.

Harnessing insights from more than a decade of data collection, our EFM has been trained on trillions of dollars of data to unearth successful buying and selling patterns. With Collective[i], any person or company can plug in their own data and receive customized insights that help them maximize economic opportunity and adapt to changing market conditions.

Founded and managed by the early teams behind LinkShare (purchased for $425m) and Overstock (NASDAQ:OSTK), Collective[i] is a private 100% remote company.

Our core values help shape our culture: We are curious. We are direct. We deliver. We succeed together. We strive for the extraordinary. If you enjoy a challenge, thrive in an innovative environment and welcome the opportunity to work with amazing humans operating on the bleeding edge of technology, Collective[i] is the place for you.

Recent press:

Forbes: Stephen Messer: Amazon Missed The AI Boom

CNBC: Harvard professor on A.I. job risks: We need to upskill ad update business models

ZDNet: Why open source is essential to allaying AI fears

Information about the founders:

Tad Martin

Stephen Messer

Heidi Messer

Similar Jobs

Cisco Meraki

Senior Site Reliability Engineer, Fleet - REMOTE within Canada

Be an Early Applicant

3 Days Ago

Canada

Remote

3,000 Employees

Senior level

Easy Apply

3,000 Employees

Senior level

Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI

The Senior Site Reliability Engineer will ensure the stability and scalability of Cisco Meraki's infrastructure. Responsibilities include automating maintenance processes, debugging failure scenarios, optimizing CI/CD workflows, collaborating with engineering teams, and developing automated tools for data collection and compliance.

Sporty Group

Weekend Site Reliability Engineer

14 Hours Ago

Remote

326 Employees

Senior level

Apply

326 Employees

Senior level

Sports

As a Weekend Site Reliability Engineer, you will enhance site reliability and security by improving current infrastructure, managing cloud operations, and mentoring junior team members. Key responsibilities include monitoring cloud infrastructure, leading project planning and deployments, and liaising with external security agencies.

Cribl

Sr Staff Site Reliability Engineer (SRE), Cloud

Be an Early Applicant

12 Hours Ago

Canada

Remote

600 Employees

Senior level

Apply

600 Employees

Senior level

Software

The Senior Staff Site Reliability Engineer at Cribl will enhance service delivery and reliability of production systems, design observability systems, mentor engineers, and drive operational excellence across all aspects of the organization with a focus on cloud technologies and automation.

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.