Here’s How You Make an Impact:
Design and maintain Python tooling and Terraform modules that standardize Datadog configuration across services.
Eliminate manual setup by codifying monitors, dashboards, SLOs, and alerting patterns.
Improve consistency, repeatability, and reliability of observability across the organization.
Define and implement observability blueprints that integrate high‑fidelity metrics, logs, and traces into the development lifecycle.
Codify best practices so teams get out-of-the-box visibility without needing deep observability expertise.
Raise the baseline for service health, debuggability, and operational readiness
Own critical parts of the Datadog platform configuration.
Improve data quality, signal-to-noise ratio, and alert reliability.
Partner with teams to adopt telemetry effectively while managing ingestion and alerting costs.
Upgrade and maintain tracers, agents, and shared observability libraries.
Ensure upgrades are automated, backwards-compatible, and minimally disruptive to product teams.
Reduce operational risk by improving rollout and validation processes.
Collaborate with Platform and Infrastructure teams to embed monitoring into systems such as Kafka, gRPC services, Kubernetes, and AWS-managed services.
Improve production visibility and reduce mean time to detect (MTTD) and resolve (MTTR) incidents
Write clean, well-tested, and maintainable Python code and Terraform modules.
Participate in architecture and design reviews; provide thoughtful feedback in code reviews.
Take ownership of projects end-to-end, from design and implementation through production rollout and support.
Assist team members to solve problems and develop their own skills.
Foster a collaborative mindset within the team.
Build and Scale Observability-as-Code
Establish Reliable & Standardised Instrumentation
Optimize Datadog Usage and Cost
Maintain and Evolve Platform Components
Integrate Observability Across Infrastructure
Deliver High-Quality, Production-Ready Code
Mentorship & Collaboration
You Thrive Here By Possessing the Following:
Degree in Computer Science, or related.
7+ years of experience in application development, platform engineering, or developer tooling.
High proficiency in Python; solid experience with Terraform.
Hands-on experience using Datadog for metrics, logging, tracing, dashboards, monitors, and alerts.
Experience with containerized and cloud-native environments (e.g., Kubernetes, Kafka, AWS, gRPC, Lambda).
Proven ability to independently drive medium-to-large initiatives from design to delivery.
Comfortable making pragmatic tradeoffs to deliver reliable, scalable solutions.
A strong product mindset for internal tools.
Passion for reducing cognitive load, eliminating toil, and making observability easy to adopt by default.
Solid understanding of modern web applications and distributed systems.
Knowledge of how observability applies to high-throughput, highly available systems
Clear written and verbal communication skills.
Ability to influence technical direction through design discussions, documentation, and hands-on implementation.
Comfortable partnering with product, platform, and infrastructure teams.
Top Skills
Wave HQ Toronto, Ontario, CAN Office
155 Queens Quay E, Toronto, Ontario, Canada, M5A 0W4

