Perplexity

AI Inference Engineer

Reposted 8 Days Ago

Easy Apply

In-Office

3 Locations

Senior level

Easy Apply

In-Office

3 Locations

Senior level

The AI Inference Engineer will develop APIs for AI inference, address performance bottlenecks, and implement model optimizations.

The summary above was generated by AI

Perplexity is an AI-powered answer engine founded in December 2022 and growing rapidly as one of the world’s leading AI platforms. Perplexity has raised over $1B in venture investment from some of the world’s most visionary and successful leaders, including Elad Gil, Daniel Gross, Jeff Bezos, Accel, IVP, NEA, NVIDIA, Samsung, and many more. Our objective is to build accurate, trustworthy AI that powers decision-making for people and assistive AI wherever decisions are being made. Throughout human history, change and innovation have always been driven by curious people. Today, curious people use Perplexity to answer more than 780 million queries every month–a number that’s growing rapidly for one simple reason: everyone can be curious.

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

Qualifications

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

The cash compensation range for this role is $190,000 - $250,000.

Final offer amounts are determined by multiple factors, including, experience and expertise, and may vary from the amounts listed above.

Equity: In addition to the base salary, equity may be part of the total compensation package.
Benefits: Comprehensive health, dental, and vision insurance for you and your dependents. Includes a 401(k) plan.

Top Skills

C++

Cuda

Kubernetes

Onnx

Python

PyTorch

Rust

TensorFlow

Triton

Similar Jobs

Speechify

Artificial Intelligence Engineer

10 Days Ago

Easy Apply

In-Office or Remote

Ithaca, NY, USA

Easy Apply

Mid level

Software

Collaborate with ML researchers and product managers to deploy and operate AI Voices, improving model performance and addressing issues.

Top Skills: DockerGCPInfrastructure As CodeKubernetesMl InferencePython

Baseten

Applied AI Inference Engineer

15 Days Ago

In-Office

Junior

Software

As an Applied AI Inference Engineer, you will develop AI applications, support customer implementation, and enhance ML projects using Python.

Top Skills: Python

Capital One

Artificial Intelligence Engineer

12 Days Ago

Hybrid

Mid level

Fintech • Machine Learning • Payments • Software • Financial Services

The Lead AI Engineer will develop and support AI software components, optimize AI performance, and lead collaborative product delivery efforts to enhance customer experiences.

Top Skills: AWSAzureGoGCPHuggingfaceJavaNemo GuardrailsPythonPyTorchScalaVectordbs

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.