AI Infrastructure Engineer

Ulta

Salary- $180K/Yr - $210K/Yr

Remote

Posted 3 weeks ago

Job Summary

We are seeking a highly skilled AI Infrastructure Engineer to design, implement, and maintain scalable infrastructure that supports artificial intelligence and machine learning workloads. The ideal candidate will have experience with cloud platforms, high-performance computing (HPC), GPU environments, containerization, and infrastructure automation. You will play a critical role in ensuring the reliability, performance, and scalability of AI systems used across the organization.

Key Responsibilities

Design, deploy, and manage AI/ML infrastructure environments for training and inference workloads.
Build and maintain high-performance computing (HPC) clusters optimized for AI applications.
Configure and manage GPU-based systems and distributed computing environments.
Optimize storage, networking, and compute resources to maximize performance and cost efficiency.
Implement infrastructure automation using Infrastructure as Code (IaC) tools.
Monitor system performance, availability, and security across AI platforms.
Collaborate with Data Scientists, ML Engineers, and Software Developers to support AI model deployment and operations.
Troubleshoot infrastructure bottlenecks and performance issues.
Ensure platform scalability, reliability, disaster recovery, and business continuity.
Manage containerized workloads using Kubernetes and Docker.
Maintain cloud-based AI environments on AWS, Azure, or Google Cloud Platform.
Establish security best practices for AI infrastructure and sensitive data.

Required Qualifications

Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.
3+ years of experience in cloud infrastructure, DevOps, Site Reliability Engineering (SRE), or AI platform engineering.
Strong knowledge of Linux system administration.
Experience with cloud platforms such as AWS, Azure, or GCP.
Hands-on experience with Kubernetes, Docker, and container orchestration.
Understanding of GPU computing technologies such as NVIDIA CUDA and GPU clusters.
Experience with infrastructure automation tools like Terraform, Ansible, or CloudFormation.
Knowledge of networking, storage systems, and distributed computing concepts.
Familiarity with AI/ML frameworks such as TensorFlow, PyTorch, or JAX.

Preferred Qualifications

Experience managing large-scale AI training environments.
Knowledge of MLOps practices and tools.
Experience with monitoring tools such as Prometheus, Grafana, ELK Stack, or Datadog.
Familiarity with Apache Spark, Ray, or distributed AI frameworks.
Relevant cloud certifications (AWS, Azure, or GCP).

Technical Skills

Linux Administration
Kubernetes & Docker
Terraform / Ansible
AWS / Azure / GCP
GPU Infrastructure Management
Python, Bash, or Go
Networking & Storage Systems
Monitoring & Observability
CI/CD Pipelines
AI/ML Platform Engineering

Salary Range

Typical Salary: $170,000 – $230,000+ per year (depending on experience, location, and expertise)

Benefits

Competitive compensation package
Performance bonuses
Health, dental, and vision insurance
Remote or hybrid work options
Professional development and certification support
Access to cutting-edge AI technologies and projects

Job Features

Job Category

ai manager, Data Science

AI Infrastructure Engineer

Ulta

Quick Links

Contacts

AI Infrastructure Engineer

AI Infrastructure Engineer

Job Summary

Key Responsibilities

Required Qualifications

Preferred Qualifications

Technical Skills

Salary Range

Benefits

Job Features

Apply For This Job

Sign in

Sign up