Stripe Logo

Stripe

Staff Software Engineer, Stream Infrastructure

Posted 4 Hours Ago
Be an Early Applicant
In-Office
Toronto, ON
Expert/Leader
In-Office
Toronto, ON
Expert/Leader
Lead design, build, and operation of Kafka-centered streaming infrastructure at global scale. Drive automation, resilience, multi-region strategies, platformization, and operator tooling to improve availability, durability, and cost-efficiency.
The summary above was generated by AI
Who we areAbout Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies - from the world’s largest enterprises to the most ambitious startups - use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.

About the team

The Stream Infrastructure team builds and operates Stripe’s real-time, event-driven platform that powers asynchronous communication between services and high-throughput streaming workloads across the company. We run globally distributed systems with high reliability and performance to meet Stripe’s scaling, availability, and product needs. The team operates dozens of Apache Kafka clusters with industry-leading reliability and efficiency, and we continually reduce operational toil by investing in automation and self-service tooling for upgrades, maintenance, and day-to-day operations. The team is distributed between Seattle, Toronto and remote locations.

What you’ll do

You’ll help define and deliver the next generation of Stripe’s Kafka-first streaming infrastructure - driving industry-level innovation to meet extremely high availability targets at global scale. Partnering with infrastructure engineers, adjacent platform teams, and the product orgs that depend on Kafka every day, you’ll set a long-term technical direction that scales with Stripe’s growth while enabling reliable, efficient operations for years to come. You’ll work on the hardest problems in operating Kafka in production - availability, resilience, performance isolation, and automated recovery - so teams across Stripe can confidently build event-driven systems on top of it.

Responsibilities
  • Design, build, and operate event-driven infrastructure with Apache Kafka at the center, alongside technologies like Temporal and AWS services
  • Partner with product and platform teams across Stripe to understand requirements, unblock Kafka adoption, and improve how streaming infrastructure is used end-to-end
  • Define and implement operational best practices (e.g., shuffle sharding, cellular architecture, load shedding, automated failover) to improve resilience and reliability at scale
  • Drive fleet-level automation and standardization (“pets” to “cattle”) through self-service workflows, safer rollouts, and self-healing systems that reduce manual operations
  • Lead initiatives that raise the bar on Kafka availability and durability (e.g., multi-region strategies, disaster recovery readiness, operational readiness reviews, incident learning)
  • Evaluate and productionize Kafka ecosystem capabilities (e.g., tiered storage, direct-to-s3) to improve cost-efficiency and scalability without compromising reliability
  • Here's some examples of recent work the team has done: 6 Nines and Tiered Storage in Production?
Who you are

We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.

Minimum requirements
  • This is a Staff-level role - that typically means 10+ years of experience building, operating, and evolving large-scale production systems
  • Experience as a technical lead for team(s) working on distributed systems, including scaling them in fast-moving environments
  • Hands-on experience with big data technologies such as Kafka, Pulsar, Flink, or Pinot
  • Comfortable operating with high autonomy and ownership
  • Growth mindset and a willingness to learn quickly, explore ambiguous problem spaces, and dive deep when needed
  • Strong written and verbal communication skills, including the ability to produce clear technical documentation
Preferred qualifications
  • Experience operating streaming technologies as a platform (e.g., Kafka, Pulsar, Flink, Pinot) for internal customers at scale
  • Experience building or operating control planes for managing large-scale infrastructure

Top Skills

Apache Kafka,Temporal,Aws,S3,Apache Pulsar,Apache Flink,Apache Pinot,Tiered Storage,Direct-To-S3

Similar Jobs

4 Hours Ago
Hybrid
Toronto, ON, CAN
Senior level
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Manage capacity planning, headcount and funding tracking for Global Talent Acquisition. Analyze and validate workforce and financial datasets, refine metrics and governance, support Finance alignment, develop scalable documentation, improve tracking tools and controls, and handle confidential HR data.
Top Skills: Excel,Power Bi,Tableau,Hris
4 Hours Ago
Hybrid
Toronto, ON, CAN
Mid level
Mid level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Serve as a client-facing product specialist driving successful adoption of Mastercard analytics and SaaS solutions. Coach customers, deliver demos and training, support data enablement and analysis, produce reports/presentations, liaise with product and tech teams, and ensure timely delivery of value-driven insights and platform enhancements.
Top Skills: Tableau,Power Bi,Excel,Powerpoint,Word,Saas,Databases
4 Hours Ago
Hybrid
Toronto, ON, CAN
Entry level
Entry level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Support global onboarding and registration programs for merchants, service providers, and prepaid customers. Drive process improvements, create documentation and training, track/report metrics, define requirements for system enhancements, participate in UAT, and collaborate cross-functionally to enhance customer experience and data-driven insights.

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account