Featherless AI Logo

Featherless AI

Machine Learning Engineer — Multilingual Data

Posted 21 Days Ago
In-Office or Remote
Hiring Remotely in World Golf Village, FL
Mid level
In-Office or Remote
Hiring Remotely in World Golf Village, FL
Mid level
Design and maintain multilingual datasets, develop data pipelines, implement quality filters, and analyze dataset biases while collaborating with researchers.
The summary above was generated by AI

We’re looking for a Machine Learning Engineer to own and scale our multilingual data pipeline—from sourcing and curation to evaluation and continuous improvement. You’ll work closely with researchers and infra engineers to ensure our models perform robustly across languages, scripts, and cultural contexts.

This role sits at the intersection of data, research, and production ML and is ideal for someone who cares deeply about data quality, linguistic diversity, and model generalization beyond English.

What You’ll Do
  • Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages

  • Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling

  • Implement quality filters using statistical, heuristic, and model-based methods

  • Work with researchers to define language coverage, benchmarks, and evaluation metrics

  • Analyze dataset bias, coverage gaps, and failure modes across regions and scripts

  • Support training, fine-tuning, and distillation workflows with high-quality multilingual data

  • Continuously iterate on datasets based on model performance and real-world usage

What We’re Looking For
  • 3+ years of experience as an ML Engineer, Applied Scientist, or similar role

  • Strong experience working with multilingual or non-English datasets

  • Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)

  • Experience building scalable data pipelines (Python, Spark, Ray, or similar)

  • Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks

  • Comfort collaborating with researchers and translating research needs into production systems

Nice to Have
  • Experience with low-resource languages or multilingual benchmarks (e.g. FLORES, XTREME)

  • Exposure to LLM training, fine-tuning, or distillation

  • Linguistics background or experience working with native language experts

  • Contributions to open-source datasets or ML tooling

  • Experience with data quality evaluation at scale

Why Join
  • Real ownership over a core differentiator of the product

  • Work on models used globally, not just in English-speaking markets

  • Small, high-caliber team with deep ML and systems experience

  • Competitive compensation + meaningful equity at Series A stage

Top Skills

Python
Ray
Spark

Similar Jobs

50 Minutes Ago
Easy Apply
Remote
US
Easy Apply
Senior level
Senior level
Cloud • Security • Software • Cybersecurity • Automation
Drive enterprise growth across Midwest accounts by leading complex B2B SaaS sales cycles for GitLab's AI-powered DevSecOps platform. Build strategic account plans, coordinate cross-functional teams, manage full sales lifecycle from discovery to implementation, generate pipeline and Net ARR, and provide forecasting and strategic insights to expand long-term customer partnerships.
Top Skills: Gitlab,Salesforce,Devsecops,Ai,Duo Enterprise,Duo Agent Platform,Sdlc,Saas
57 Minutes Ago
Remote
United States of America
Expert/Leader
Expert/Leader
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
Lead Product Risk Management operations: partner with product and engineering to identify, assess, mitigate, and report product risks; drive processes, policies, tooling, and standards; perform risk assessments, monitoring, executive reporting, and support audits and regulators.
Top Skills: SQL
58 Minutes Ago
Remote
United States of America
Expert/Leader
Expert/Leader
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
Own regional growth strategy for USDC, scaling liquidity and adoption through partnerships, AI-driven experimentation, and automated growth platforms. Lead cross-functional teams to deliver regionally compliant initiatives and track adoption metrics.
Top Skills: Ai,Agent-Driven Systems,Experimentation Platforms,Automation Platforms,Blockchain,Stablecoins,Usdc

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account