CMG (Capital Markets Gateway) Jobs

Site Reliability Engineer

CMG (Capital Markets Gateway)

Site Reliability Engineer

Posted 6 Hours Ago

Be an Early Applicant

Remote

Hiring Remotely in Canada

Mid level

Remote

Hiring Remotely in Canada

Mid level

Responsible for ensuring the reliability and performance of infrastructure and applications through monitoring, alerting, and incident management. The role includes developing observability solutions and optimizing system performance, while collaborating with cross-functional teams to address technical challenges.

The summary above was generated by AI

The Company

Capital Markets Gateway LLC (CMG) is a capital markets-focused fintech transforming global equity capital markets (ECM) through data, technology, and connectivity. As the preferred source for ECM analytics and the first network connecting the buy-side and sell-side for ECM workflows, we are committed to reshaping how capital markets operate. Founded in 2017 by a team of ECM practitioners, CMG has completed three successful fundraising rounds and is backed by a group of the world’s most prestigious financial institutions. The CMG platform is currently relied upon by nearly 150 buy-side firms representing $40 trillion in AUM and 22 global investment banks. For more information, please visit www.cmgx.io. 

The Role

CMG is looking for a Site Reliability Engineer (SRE) with a strong focus on monitoring, observability, and alerting to ensure the reliability, performance, and scalability of our infrastructure and applications. You will be responsible for designing, implementing, and maintaining monitoring solutions to provide visibility into system health and performance, proactively detect anomalies, and reduce incident response time. 

Our Engineering Team

The CMG engineering team consists of domain experts who work collaboratively within a culture of cross-domain knowledge sharing. We value engineers who are passionate about modern technologies and best practices.

Our engineers are encouraged to challenge the status quo and are constantly seeking improvement and efficiency in our code-base and platform. CMG engineers are empowered to explore solutions using bleeding edge technologies such as AI and bring recommendations to the table. We are in a period of making impactful engineering decisions.

As part of our process, we believe in taking the time for research and prototyping - this is critical in making the right decisions. Given the experience of our team, we have naturally adopted best practices from local development, through code review and into production rollouts. Besides the standard pull requests, test automation, code coverage tracking, containerization, and one-click deployments we are constantly reviewing these foundational components to develop new best practices.

Responsibilities

Monitoring & Observability

Design, implement, and maintain monitoring and observability solutions using tools like Prometheus, Grafana Stack (Loki/Grafana/Tempo/Alert Manager), Datadog, and OpenTelemetry. 

Define and implement SLOs, SLIs, and error budgets to measure system reliability. 

Develop and optimize dashboards, alerts, and reports for system performance and business metrics.

Alerting & Incident Management

Design actionable alerting strategies to minimize noise and improve MTTR.

Integrate alerting systems with Jira.

Establish and refine runbooks for on-call teams to handle alerts efficiently.

Empower teams to ensure observability coverage and incident response practices. 

Performance Optimization

Analyze system performance metrics, identify bottlenecks, and implement optimizations to improve system efficiency, scalability, and cost-effectiveness.

Help conduct load testing and capacity planning to ensure systems can handle peak traffic loads. 

Automation and Tooling

Identify opportunities for automation and develop tools to streamline operational processes, such as fail-over, configuration management, and monitoring.

Implement monitoring and alerting systems within automations to detect and resolve issues proactively. 

Collaboration and Communication

Collaborate closely with cross-functional teams, including software engineers, operations, and infrastructure teams, to understand system requirements, provide technical guidance, and drive solutions.

Communicate effectively to stakeholders about system changes, incidents, and improvements. 

Foment and spread SRE principles and practices across company.

Qualifications

Must be based in Latin America
English level - C1 or C2
Proven experience as a Site Reliability Engineer or similar role. 
Proficiency in logging, metrics, and tracing frameworks (DataDog, Loki, Prometheus, OpenTelemetry). 
Experience with cloud platforms (Azure preferred) and infrastructure-as-code tools (e.g., Terraform). 
Strong programming and scripting skills (Python, Bash). 
Proficiency in containerization technologies and orchestration tools (Docker, Kubernetes).
Understandingof Linux-based systems, networking, and security principles related to containerized applications. 
Strong problem-solving and troubleshooting skills, with a passion for identifying and resolving complex technical issues. 
Excellent communication and collaboration abilities. 
Ability to thrive in a fast-paced, constantly evolving environment. 
Experience with PostgreSQL monitoring and optimization (Optional/Nice to have).

If you're passionate about building resilient financial systems, optimizing observability at scale, and solving real-world reliability challenges in capital markets, we’d love to have you on our team! 

Our Tech Stack

Azure as an infrastructure provider. We are reviewing secondary cloud options.

Docker + Kubernetes for microservice orchestration using Istio service mesh.

PostgreSQL for relational db, ElasticSearch for indexing, Redis for caching.

DataDog, Grafana and OpenTelemetry for observability.

GitHub for our Version Control and CI (with our own runners).

CD: Harness and FluxCD.

Terraform and Terragrunt as IaaC.

Python and bash for scripting infrastructure.

React - We’re all in on React – we maintain multiple single-page React apps.

TypeScript – 99% of our codebase is TypeScript.

Latest .NET version for our backend services.

GraphQL - Our standard for API communication is GraphQL served by our DotNet Back-End.

Our Values

We innovate with purpose 

We focus on outcomes vs. output 

We believe diverse and inclusive teams fuel innovation 

We are humble yet candid 

We do right by the customer

What We Offer

Equity

Unlimited PTO (15 days + bank holidays + unlimited additional paid leave)
Comprehensive benefits program managed by Globalization Partners

Premium life and income protection

Top private medical and dental insurance

Employee Assistance Program (EAP)

Pension contributions

Remote work environment

Education reimbursement

Continuous learning opportunities

Employee referral bonus

Parental leave

CMG embraces our ongoing commitment to building a culture reflecting the people, perspectives, and passions it represents. We will accept nothing less than equity, inclusion, and belonging for all. With the only constant in life being change, we will always listen, learn, and improve for the betterment of our teams, customers, and communities. CMG is proud to be an Equal Opportunity Employer.

Similar Jobs

ScalePad

Site Reliability Engineer

6 Days Ago

In-Office or Remote

Senior level

Information Technology • Software

The Staff Site Reliability Engineer will own production infrastructure, drive operational excellence, enhance developer experience, lead technical initiatives, and mentor engineers.

Top Skills: Ai ToolingAWSAzureCi/CdKubernetesTerraform

GitLab

Site Reliability Engineer

13 Days Ago

Easy Apply

In-Office or Remote

Canada

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

As an Intermediate Site Reliability Engineer in Environment Automation, you'll automate operations across many GitLab environments, maintain infrastructure reliability using Kubernetes, and enhance IT practices with Terraform and Ansible, while collaborating with senior engineers.

Top Skills: AnsibleCloud ServicesDevsecopsGitlabGoInfrastructure As CodeKubernetesTerraform

Oscilar

Site Reliability Engineer

16 Days Ago

Remote

Canada

Senior level

Artificial Intelligence • Fintech • Software • Financial Services

The SRE will own reliability for a cloud-native platform, optimizing performance, availability, and observability, while mentoring engineering teams.

Top Skills: AWSClickhouseGoKafkaKubernetesPulumiPythonTerraform

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.