Our client, a world leader in the semiconductor industry, is looking forHPC Systems Engineer.Kindly have a look at the details below.
Job Title : HPC Systems Engineer Location: Milpitas, CA - Onsite Job Duration: 12+ Months(Possibility Of Further Extension)
Responsibilities for this exciting role will include:
- Design and implementation of high-performance compute clusters
- Solid knowledge on the HPC cluster systems, including scalable/robust storage, high-bandwidth inter-connects, CPU / GPU architecture, and a knowledge of cloud-based computing architectures
- Apply their strong skills with the Linux OS to configure a suitable operating system for the design
- Understand and gather the project specifications and performance requirements at the subsystem and system levels.
- Adhere to and drive project timelines to ensure program achievements complete on time.
- Support release of new products to manufacturing and the customer providing quality procedures, scripts and documentation to the manufacturing team and customer support team.
Qualifications/Education Desired
- Minimum 5-10 years validated flavor agnostic Linux system administration experience.
- Cross domain HPC architecture design experience
- Python and Bash Scripting experience for task automation.
- Experience with configuration management tools.
- Experience with deep learning infrastructure management.
- Familiarity with container management tools.
- Experience of crafting and maintaining robust storage (Distributed, Redundant, Low latency)
- BSEE (MSEE/PHD is highly preferred)
Nice to Have
- Linux: RHEL, SLES, Rocky, Ubuntu, kernel compilation
- Configuration Management: Jenkins, KIWI, Ansible, Chef, SALT, Openstack
- Containers: Docker, Singularity, (nice to have Kubernetes)
- Network: VPN, VLAN, LAG, NAT, ACL, (nice to have: Infiniband, MPI, IB Switch configuration)
- Storage: RAID, ZFS, CEPH, Luster, BeeGFS
- Tools: Grafana, TF (tensor flow), CUDA, Prothmesis, Slurm
- HPC: openHPC
- VM: vmWare, Xen, KVM, QEMU, Vagrant
- Scripting: Bash, Python
Skills
- Experienced designer of storage systems (Distributed, Redundant, low latency)
- Flavor agnostic Linux system administration experience.
- Cross domain HPC architecture design experience
- Scripting experience for task automation.
- Experience with configuration management tools.
- Experience with deep learning infrastructure management.
- Familiarity with container management tools.
- openSuse KIWI experience desired
- Prothmesis experience desired
- Jenkins experience desired
- Windows OS System Administration desiredKubernetes experience a bonus
LINUX, Python, HPC