AI Application Team Lead
Description
NextSilicon is reimagining high-performance computing (HPC & AI). Our accelerated compute solutions leverage intelligent adaptive algorithms to vastly accelerate supercomputers, driving them forward into a new generation. We have developed a novel software-defined hardware architecture that is achieving significant advancements in both the HPC and AI domains.
At NextSilicon, everything we do is guided by three core values:
- Professionalism: We strive for exceptional results through professionalism and unwavering dedication to quality and performance.
- Unity: Collaboration is key to success. That's why we foster a work environment where every employee can feel valued and heard.
- Impact: We're passionate about developing technologies that make a meaningful impact on industries, communities, and individuals worldwide.
- impact on industries, communities, and individuals worldwide.
We are seeking a highly skilled AI Application Team Lead to build and lead a team responsible for developing, running, and optimizing large-scale AI workloads on NextSilicon’s AI hardware platform. This role focuses on benchmarking state-of-the-art models (e.g., LLaMA, DeepSeek), executing MLPerf suites, analyzing system-level performance, and driving cross-stack optimizations across hardware, runtime, and software frameworks.
The ideal candidate combines strong technical depth in AI/ML systems, hands-on experience with LLM workloads, and leadership capability to guide a high-performance engineering team.
Requirements
- 5+ years of experience in AI/ML engineering, performance optimization, or ML systems.
- Deep understanding of LLM architectures, training & inference mechanics, and modern ML frameworks.
- Strong proficiency in PyTorch ecosystem, with a specific focus on performance tuning via Triton, Cuda or MLIR-based compiler frameworks.
- Hands-on expertise profiling and optimizing kernels (GEMM, attention, softmax, token pipelines).
- Demonstrated experience running or tuning MLPerf or similar large-scale benchmarks.
- Strong Python and C++ development skills.
- Proven leadership experience: mentoring, guiding, or managing engineers.
Responsibilities
- Lead and mentor a team of AI application and performance engineers.
- Run and optimize AI workloads (LLaMA, DeepSeek, etc.) and execute MLPerf benchmarks.
- Analyze end-to-end performance and identify HW/SW bottlenecks.
- Develop optimization strategies across models, kernels, frameworks, and runtime.
- Build profiling, debugging, and validation tools for large-scale AI workloads.
- Collaborate with hardware, compiler, and device software teams to improve performance.
NextSilicon is proud to be an Equal Opportunity Employer. We do not discriminate based upon race, religion, color, age, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, genetic information, status as a protected veteran, status as an individual with physical or mental disability, or other applicable legally protected characteristics. This policy applies to all employment practices within our organization, including hiring, recruiting, promotion, termination, layoff, recall, leave of absence, compensation, benefits, training, and apprenticeship. NextSilicon makes hiring decisions based solely on qualifications, merit, and business needs at the time.
jobs@nextsilicon.com
For any questions please ask us at questions@nextsilicon.com