1. Home
  2. /
  3. Jobs
  4. /
  5. RDMA Engineer - Supercomputing
← Back to all jobs

RDMA Engineer - Supercomputing

xAI

xAI logo
📍 San Francisco•Full Time
Visit company website →

RDMA Engineer - Supercomputing at xAI

Location: San Francisco Employment Type: Full-time Salary: Not specified

About the Role

About the Role RDMA Engineers on xAI’s Supercomputing team design and optimize low-latency, high-bandwidth networking solutions using NVIDIA’s RDMA-capable technologies to support some of the world’s largest GPU supercomputing clusters. These clusters drive AI training and inference workloads, demanding cutting-edge performance and scalability. Focus Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes. Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead. Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems. Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI. Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments. Ideal Experience Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments. Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization. Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory). Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads. Knowledge of Kubernetes networking and integrating RDMA into containerized environments. Bonus: Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization). Tech Stack NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE) RDMA protocols (e.g., GPUDirect RDMA, RoCEv2) Kubernetes Rust and C/C++ MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library) Annual Salary Range $180,000 - $440,000 USD Benefits Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.


About xAI:

xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.

  • 🚀 Building Grok - advanced AI assistant
  • 💻 Operating world-class AI supercomputing clusters
  • 🎯 Small, highly motivated team focused on engineering excellence
  • 🏢 Flat organizational structure with hands-on leadership

Founded by Elon Musk

Contact

Apply on Greenhouse
Apply: https://job-boards.greenhouse.io/xai/jobs/4691436007

Related Opportunities

Medicine Tutor

xAI

📍 Remote•Full Time

Senior Grok Engineer

xAI

📍 San Francisco•Full Time

Senior Product Engineer - Starfleet

xAI

📍 San Francisco•Full Time
Apply for this Job →
waterloo.app
Explore
InboxSoon
Saved
JobsCompaniesResourcesEvents
Sign in
waterloo.app
Sign in