NVIDIA Logo

NVIDIA

Senior Software Engineer, AI Inference

Posted 8 Days Ago
Be an Early Applicant
In-Office
Toronto, ON, CAN
Senior level
In-Office
Toronto, ON, CAN
Senior level
As a Senior Software Engineer, you'll work with large-scale LLM serving, improve performance on NVIDIA's inference stack, and collaborate with customers on benchmarking and optimization.
The summary above was generated by AI

Help us push the boundaries of AI inference at NVIDIA — where your systems expertise shapes both the technology and the teams building on top of it!

We're looking for a Senior Software Engineer to work at the frontier of large-scale LLM serving, partnering directly with some of the world's most technically demanding customers to unlock the full performance potential of NVIDIA's inference stack. In this role, you'll combine deep systems knowledge with hands-on customer engagement — profiling real deployments, benchmarking across GPU clusters, and turning insights into improvements that ripple across the open-source ecosystem. Do you love digging into performance problems that don't have obvious answers, and want your work to have an impact far beyond a single codebase? We'd love to talk. Unlike traditional customer-facing engineering roles, we expect you to go far deeper — contributing to vLLM, NVIDIA Dynamo, and the tooling that makes every engineer on your team more effective.

What You'll be doing:

  • Work directly with customer engineering teams through long-term technical partnerships, understanding their LLM serving architectures and performance goals, then designing and implementing end-to-end benchmarking campaigns across Kubernetes and Slurm environments to surface actionable insights.

  • Set up and operate vLLM serving deployments on GPU clusters, tuning configurations for throughput, latency, and efficiency — and collect Nsight Systems / Nsight Compute profiling traces to identify performance gaps relative to reference frameworks.

  • Develop detailed performance plans based on profiling findings and collaborate with NVIDIA's kernel engineering and OSS vLLM teams to drive improvements that benefit both your customers and the broader community.

  • Build internal tools, benchmarking harnesses, and automation pipelines that raise the productivity of your teammates and customers alike — with a multiplier attitude that makes everyone around you more effective.

  • Document architectures, findings, and recommendations with clarity for technical audiences, and contribute improvements back to vLLM and related open-source projects where appropriate.

What We Need to See:

  • Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or equivalent experience.

  • 5+ years of industry experience building and operating complex, production-grade software systems, with strong instincts for how systems behave at scale.

  • Hands-on experience deploying and operating LLM inference workloads — particularly with vLLM — including configuration, optimization, and debugging in real-world environments.

  • Proficiency with container orchestration (Kubernetes) and HPC scheduling (Slurm) for running GPU-accelerated workloads.

  • Solid understanding of LLM serving fundamentals: batching strategies (continuous batching, chunked prefill), KV cache management, and tensor/pipeline parallelism.

  • Familiarity with GPU performance analysis: memory hierarchy, utilization, roofline modeling, and profiling with Nsight Systems or Nsight Compute.

  • Strong written and verbal communication skills, with the ability to present technical findings clearly to both engineering teams and leadership — and to navigate ambiguous, open-ended customer problems.

Ways to Stand Out from the Crowd:

  • Experience with NVIDIA Dynamo or other disaggregated inference serving frameworks.

  • Contributions to open-source inference or ML systems projects, particularly vLLM or SGLang — please include links to relevant pull requests or artifacts.

  • Background with ML compilers or GPU kernel development (Triton, CUTLASS, TorchInductor).

  • Experience building developer tools or internal platforms that meaningfully improved team productivity.

  • Prior experience in a customer-facing or forward-deployed engineering capacity within a technical product organization.

Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/ 

#LI-Hybrid

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 135,000 CAD - 185,000 CAD for Level 3, and 170,000 CAD - 220,000 CAD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until April 14, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

Top Skills

Ai Inference
Kubernetes
Nsight Compute
Nsight Systems
Slurm
Vllm

NVIDIA Toronto, Ontario, CAN Office

Toronto, Ontario, Canada

Similar Jobs

18 Days Ago
In-Office
Toronto, ON, CAN
Senior level
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The role involves building AI inference systems, optimizing GPU performance, and developing benchmarking methodologies for large-scale deployments.
Top Skills: C/C++CudaCutlassDockerGoInductorKubernetesLlvmMlirMlperfPythonPyTorchRustSglangSlurmTorchdynamoTritonVllmXla
5 Hours Ago
Hybrid
Toronto, ON, CAN
Senior level
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Lead the vision and strategy for the product ecosystem, drive innovation and new product development, and manage cross-functional collaboration.
Top Skills: APIsBackend PlatformsData PlatformsFrontend Platforms
5 Hours Ago
Remote or Hybrid
Toronto, ON, CAN
Mid level
Mid level
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
The Account Executive will drive outbound sales, establish relationships with C-level executives, negotiate contracts, and support SME growth through innovative financial solutions.
Top Skills: Google SuiteLinkedin Sales NavigatorOutreachSalesforceZoominfo

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account