Design, deliver, and operate software for observability; solve scaling issues in Metrics & Alerting; participate in on-call rotation and mentorship.
Available Locations: London or Lisbon
About the Department
Production Engineering is responsible for the world's most reliable, observable, performant, and safe network ecosystem. Our customers rely on our products and systems to safely modify, troubleshoot, and release products without external impact.
Our external customers rely on us to provide seamless and predictable incident, traffic, policy management, resulting in the fastest and safest network services in the world.
We are accountable for the overall performance of internal and external facing services, guiding our product teams to optimal configurations and maximum efficiency. From the moment that a packet enters the Cloudflare ecosystem, we know exactly what its expected purpose and behaviour is and we are capable of determining and exposing anomalous behaviour.
The Cloudflare network makes it possible to solve challenges at massive scale and efficiency which would be impossible for almost any other organization.
About the Team
This role is for the internal Observability Team, responsible for the observability platform and stack to make our engineering teams productive. This includes (but is not limited to) areas like metrics, alerting, error tracking, logging, tracing, and more.
In this role, you can expect to:
We are a small team, well-funded, growing and focused on building an extraordinary company. This is a software engineering/systems engineering role and is a superb opportunity to be part of a high performing team to help to support Cloudflare's mission and help build a better internet.
You may be a good fit for our team if you have:
Bonus points if you have:
About the Department
Production Engineering is responsible for the world's most reliable, observable, performant, and safe network ecosystem. Our customers rely on our products and systems to safely modify, troubleshoot, and release products without external impact.
Our external customers rely on us to provide seamless and predictable incident, traffic, policy management, resulting in the fastest and safest network services in the world.
We are accountable for the overall performance of internal and external facing services, guiding our product teams to optimal configurations and maximum efficiency. From the moment that a packet enters the Cloudflare ecosystem, we know exactly what its expected purpose and behaviour is and we are capable of determining and exposing anomalous behaviour.
The Cloudflare network makes it possible to solve challenges at massive scale and efficiency which would be impossible for almost any other organization.
About the Team
This role is for the internal Observability Team, responsible for the observability platform and stack to make our engineering teams productive. This includes (but is not limited to) areas like metrics, alerting, error tracking, logging, tracing, and more.
In this role, you can expect to:
- Design, deliver, and operate software and a platform that progresses Cloudflare's Observability competency
- Solve scaling bottlenecks in critical services in our Metrics & Alerting pipeline
- Work on highly distributed and scalable systems
- Participate in the constant cycle of knowledge sharing and mentoring
- Participate in the global on-call rotation for the services your team owns
- Research and introduce cutting-edge technologies
- Contribute to open-source
We are a small team, well-funded, growing and focused on building an extraordinary company. This is a software engineering/systems engineering role and is a superb opportunity to be part of a high performing team to help to support Cloudflare's mission and help build a better internet.
You may be a good fit for our team if you have:
- A Software Engineering background and proficiency in high-level programming languages (e.g., Go)
- Proficiency in Data structures and databases like TSDBs, Columnar stores or related
- Proficiency in distributed Linux environments
- Proficiency in designing high-scale distributed systems
- Proficiency in Prometheus, Alertmanager, Thanos
- Experience working in a fast, high-growth environment
- Experience working in a 24/7/365 service environment
- Exquisite written and verbal communication skills
- Familiarity with Internetworking, networking protocols Layer 2-7 of the OSI model and BGP
- Strong bias for action
Bonus points if you have:
- Experience with high-bandwidth transit Internetworking and routing
- Passion for code simplicity and performance
Top Skills
Alertmanager
Go
Linux
Prometheus
Thanos
Similar Jobs at Cloudflare
Cloud • Information Technology • Security • Software • Cybersecurity
Responsible for leading Cloudflare's Workers KV and Hyperdrive products, driving strategy and roadmap, and enhancing developer experiences. Collaborate with teams to deliver features and engage developers for feedback.
Top Skills:
GoJavaScriptRustSQLTypescript
Cloud • Information Technology • Security • Software • Cybersecurity
As a Senior Product Manager, you'll shape Cloudflare's Rules product, guiding development, engaging customers, analyzing data, and collaborating across teams.
Top Skills:
Cloud InfrastructureData AnalysisProduct DesignUser Research
Cloud • Information Technology • Security • Software • Cybersecurity
Drive the full sales cycle for Cloudflare solutions in Slovakia and Czech Republic, focusing on new business, customer relationships, and pipeline development.
Top Skills:
Cloud InfrastructureCloudflareComputer NetworkingEnterprise SoftwareNetwork SecuritySaaS
What you need to know about the Toronto Tech Scene
Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

