Lutra Jobs

Staff Platform Engineer, Manufacturing AI

Lutra

Staff Platform Engineer, Manufacturing AI

Reposted Yesterday

Remote

Hiring Remotely in Canada

Mid level

Remote

Hiring Remotely in Canada

Mid level

The Staff Platform Engineer will design and implement scalable infrastructure for AI applications in manufacturing while ensuring reliability, operability, and cross-team collaboration.

The summary above was generated by AI

Your opportunity

Our client is a well-funded, seed-stage AI startup that builds agents for the factory floor. They develop and distribute a software-first agent layer that plugs into the cameras and machines factories already have. Their models run and act at the edge so agents can see, decide, and act in real time. Events and metrics flow into a dashboard that provides plant teams immediate visibility. They’re approaching a large (~$14B) and underserved market with a disruptive, asset-light alternative to hardware-heavy robotics and batch analytics and they’ve already found early traction with clients in the food & beverage, pharma/cosmetics, and materials processing verticals.

As a staff platform engineer, you’ll join an emergent platform team and help shape the form that it takes. You will become fluent in the hardware platform, networking topologies, and application stack and draw on your longitudinal perspective to build a platform practice and guide the leadership team’s decision making. You’ll be firmly in the critical path (to begin) as the primary on-call and equipped to turn incidents into monitoring signals, build playbooks from first principles, and shape a culture of streamlined root cause analysis.

You’ll be joining a flat, dynamic environment in the midst of its scale-up phase that’s led by an accomplished ex-Deepmind researcher with specialization in reinforcement learning, deep learning and robotics. The company closed a $13.9M CAD seed round in March of 2025 and are scaling R&D and delivery to meet accelerating demand, with headcount tracking to double by year-end.

Please note that this role may involve participation in an on-call rotation that includes evenings and weekends.

Thematic responsibilities

Infrastructure & application ownership: Design and implement scalable infrastructure architectures across on-premise (edge) and cloud environments; evolve core infrastructure platforms that support production and pre-production workflows
Pre-production environments & validation: Build and maintain sandbox, staging, and shadow-run environments that mirror production behavior; own how systems are provisioned, isolated, tested, and validated before rollout
Replay-based testing & safe version rollouts: Design infrastructure to support A/B playback testing of models and software versions, offline and replay-based workload testing, and shadow-mode execution prior to version switching
Reliability engineering, fault isolation & performance determinism: Define infrastructure standards that ensure reliable, isolated systems with predictable performance under real-world workloads
Operability & cross-team collaboration: Partner with DevOps to ensure infrastructure designs are deployable, observable, and operable; collaborate with Edge and AI teams to enable safe experimentation

Tech stack

Operating system: Linux
Backend: Python (Flask, FastAPI), TypeScript/Node.js
Orchestration & compute: Kubernetes, on-prem bare metal, VMs
Containers: Docker
Monitoring, observability & logging: Prometheus, Grafana, ELK
Cloud providers: AWS, Azure, GCP
Databases & storage: SQL, InfluxDB, MongoDB
Messaging & IoT: MQTT, HTTP/REST, RabbitMQ, Apache Kafka
Edge platforms: NVIDIA Jetson, Raspberry Pi (ARM)
GPU/acceleration: CUDA, TensorRT, ONNX, OpenVINO
ML/DL frameworks: PyTorch, TensorFlow, Keras, scikit-learn
Scientific computing: NumPy, Pandas
Computer vision: OpenCV
Cameras & vision I/O: GenICam, GigE Vision, USB3 Vision
Industrial automation: PLC integration; protocols: Ethernet/IP, Modbus, Profinet, OPC UA

Your know-how

You have significant experience supporting the design and implementation of scaled production environments in hybrid (edge-cloud) or on-prem environments
You have strong Linux systems knowledge and experience building and operating underlying compute platforms
You have significant experience with infrastructure orchestration platforms (Kubernetes/K8s preferred) and/or virtualization platforms
You are experienced with monitoring, observability and alerting stacks and best practices
You have high comfort with, and understanding of, distributed systems and failure modes
You have enough software engineering skills to be dangerous, and specific command of Python for infrastructure automation and validation tooling
You have experience collaborating effectively within and across cross-functional delivery teams
You are a contagiously curious person with entrenched learning habits

It’s a bonus if

You have experience designing and operating scaled production environments for manufacturing, robotics, IoT and/or industrial automation applications
You have deep expertise in computer vision, robotics, or manufacturing automation
You have experience supporting GPU-based or real-time workloads
You are predisposed to mentorship and crafting a culture of continuous improvement
You have experience scaling an AI and/or B2B SaaS venture

Interested in learning more?

Please apply using the following form or send your resume or LinkedIn profile URL to [email protected] with “Staff Platform Engineer, Manufacturing AI” as the subject line. One of our talent partners will be in contact shortly.

Compensation

The base pay range for this role is CA$200,000 – CA$250,000 per year.

Similar Jobs

PwC

Oracle PMO - Senior Associate

2 Hours Ago

Remote or Hybrid

Senior level

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI

Coordinate and manage large-scale Oracle Cloud/Fusion ERP implementations: define objectives, develop project plans, allocate resources, monitor progress, mitigate risks, use Oracle Project Resource Management and BPM tools, facilitate cross-functional teams, produce scope/status reporting, and maintain stakeholder communications and training.

Top Skills: Oracle Agile Product Lifecycle Management (Plm)Oracle Business Process ManagementOracle CloudOracle ErpOracle FusionOracle Project ManagementOracle Project Resource Management

PwC

Oracle PMO - Manager

2 Hours Ago

Remote or Hybrid

Mid level

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI

Lead and coordinate large-scale Oracle Cloud/Fusion implementation projects, manage resources and stakeholders, develop plans, mitigate risks, drive cross-functional alignment, coach team members, and ensure delivery meets professional and technical standards.

Top Skills: Oracle Agile Product Lifecycle Management (Plm)Oracle Business Process ManagementOracle CloudOracle Core ErpOracle FusionOracle Project Portfolio Management (Ppm)

PwC

Oracle PMO - Senior Manager

2 Hours Ago

Remote or Hybrid

Senior level

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI

Lead and manage large-scale Oracle Cloud/Fusion implementation projects within Finance Technology. Define scope, develop project plans, allocate resources, mitigate risks, manage stakeholders, oversee program reporting and budgeting, coach teams, and drive process improvement for successful Oracle ERP delivery.

Top Skills: Oracle Agile Product Lifecycle Management (Plm)Oracle Business Process ManagementOracle CloudOracle Core ErpOracle FusionOracle Project ManagementProject Portfolio Management (Ppm)

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.