We are seeking a highly skilled Senior Distributed Systems Engineer to join our Omniverse Infrastructure team. You will play a key role in designing, building, and optimizing large-scale distributed systems and infrastructure for the Omniverse Cloud. This is an extraordinary opportunity for a highly motivated and dedicated engineer who has in-depth understanding of distributed storage, high-performance networking, compute systems, and distributed system architecture.
NVIDIA Omniverse™ Cloud is a platform-as-a-service (PaaS) that provides developers and enterprises a full-stack cloud environment to design, develop, and deploy industrial Omniverse applications. The Omniverse Infrastructure organization develops hardware and software systems to power the Omniverse Cloud.
What you will be doing:
- Architect, design, build, and optimize distributed systems.
- Drive end-to-end Omniverse platform optimization from a hardware level to the application and service levels.
- Develop infrastructure and microservices to support Omniverse users and developers in the deployment of a wide range of workloads.
- Address challenges related to compute, networking, and storage resource utilization in a heterogeneous computing environment.
- Collaborate with multiple Omniverse product teams to understand customer storage and compute requirements and build supporting infrastructure.
- Collaborate across org boundaries with a diverse set of engineers.
- Adapt and/or develop performance modeling and analysis tools to identify and optimize performance bottlenecks in Omniverse workloads and drive future system designs.
- Ability to multitask effectively in a dynamic environment.
What we need to see:
- Masters or PhD in Computer Science or a related field (or equivalent experience).
- 5+ years of hands-on software engineering experience building large-scale distributed, fault-tolerant systems and services.
- Strong systems programming skills, including multi-threading, concurrency, caching, and batching.
- Proficiency in C, C++, and Python. Experience with cloud infrastructure platforms like AWS, Azure, and Google Cloud.
- Solid technical foundation and a deep understanding of cloud technologies, distributed systems, and microservices architecture.
- Excellent interpersonal skills and ability to work successfully with multi-functional teams, principles, and architects across organizational boundaries and geographies.
- Understanding of virtualization and containerization technologies like Docker, Kubernetes, VMware, KVM, etc.
Ways to stand out from the crowd:
- Hands-on experience in performance optimization and benchmarking on large-scale distributed systems.
- Experience in developing large-scale distributed applications and services on supercomputing and/or cloud environments.
- Experience with NVIDIA GPUs, HPC storage, networking, and cloud computing.
- In-depth understanding of storage systems, Linux file systems, and RDMA networking.
- Share references to your code contributions.
Join our team and contribute to the development of innovative technologies that will power the future of the Omniverse platform. Apply today to become a part of our dynamic and innovative team!