Senior ML Performance Engineer

Job role overview

  • Date posted

    May 27, 2026

  • Hiring location

    Santa Clara

Description

Job Description Job Description About Us At Lemurian Labs, we're reimagining the foundations of computing to make AI accessible to everyone. Our mission is to remove the limits of scale, hardware, and cost that hold back innovation, so the people solving humanity's hardest problems can move faster.
We're building a new kind of software stack: a hardware-agnostic platform that makes every system — from a laptop to a supercomputer — feel like one seamless engine. Developers can write once, run anywhere, and get state-of-the-art performance across any chip, any cloud, at any scale. It's a complete rethink of how software and hardware interact — designed for the era beyond Moore's Law.
We're not looking for the comfortable or the conventional; we're looking for the bold. The engineers who crave frontier problems, who want to bend the limits of what's possible, who see infrastructure not as a constraint but as a canvas. If you want to build the foundation for the next era of AI and change what humanity can achieve in the process, join us.
About the Role We're looking for a Senior ML Performance Engineer to architect and lead our Performance Testing Platform from the ground up. You'll be the technical authority on how we measure, validate, and optimize the performance of large language models — including Llama 3.2 70B, DeepSeek, and others — before and after compiler optimization on modern GPU architectures.
This is a high-impact role at the intersection of ML systems, GPU architecture, and performance engineering. You'll build the infrastructure that proves our compiler delivers real, measurable value — and you'll work directly with compiler and ML engineers to drive the optimizations that get us there.
What You'll Do Design and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clusters
Define and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracy
Establish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvements
Develop automated testing pipelines for continuous performance validation across compiler releases and model updates
Investigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizations
Create dashboards and reporting that provide clear visibility into performance trends, regressions, and wins
Collaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflow
Document best practices for performance testing and optimization of ML workloads on GPU hardware
Essential Skills and Experience: BS degree in computer science, computer engineering, electrical engineering, or equivalent practical experience
7+ years of experience in performance engineering, benchmarking, or systems engineering roles
Deep understanding of ML inference workloads, particularly transformer-based models and LLMs
Hands-on experience with GPU programming and optimization (CUDA, ROCm, or similar)
Strong programming skills in Python and C/C++
Proven track record of building performance testing infrastructure or benchmarking platforms from scratch
Experience with ML frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM, etc.)
Proficiency with profiling and debugging tools for GPU workloads
Strong analytical skills with the ability to design experiments, analyze results, and communicate findings clearly
Experience with CI/CD systems and test automation frameworks
Preferred Skills and Experience: Masters or PhD degree in computer science, computer engineering, electrical engineering, or equivalent practical experience.
Experience with AMD GPUs (Mi200/Mi300 series) and ROCm ecosystem
Knowledge of compiler optimization techniques and their impact on performance
Experience with distributed inference and multi-GPU workloads
Familiarity with ML model quantization, pruning, and other optimization techniques
Background in high-performance computing or systems-level optimization
Experience with infrastructure-as-code (Kubernetes, Docker, Terraform)
Contributions to open-source ML or systems projects
Personal Attributes Precision-driven: you catch the 2% regression that others miss.
Self-directed: you take ownership and don't wait for permission to solve problems.
Collaborative: you work well across teams and actively help others succeed.
Clear communicator: you can explain complex technical concepts to engineers and stakeholders alike.
Why Join Lemurian Labs Build the performance testing infrastructure that validates the future of efficient AI.
Own a high-visibility platform that directly influences product quality and customer success.
Work with cutting-edge GPU hardware and next-generation LLMs.
Competitive compensation including equity, medical/dental/vision, retirement savings, and wellness benefits.
Lemurian Labs is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees, regardless of gender identity, race, ethnicity, sexual orientation, disability status, age, or background.
Compensation depends on experience and geographic location and will be narrowed during the interview process. Additional benefits include equity, company bonus opportunities, medical, dental, and vision coverage, a retirement savings plan, and supplemental wellness benefits.

work mode

On-site

Interested in this job?

28 days left to apply

Apply now

Call employer
Apply now
Send message
Cancel