Lead GPU Infrastructure Engineer (HPC / AI Infrastructure)

800299 Posted: 11/05/2026

Competitive
North America - Remote
Permanent

We’re partnering with a rapidly scaling technology business building advanced compute infrastructure for next-generation AI systems. This is an opportunity for a senior infrastructure engineer to play a key role in designing and operating large-scale GPU environments supporting highly demanding, enterprise-grade workloads across modern high-performance compute platforms.

The Company

Our client is building next-generation infrastructure at the intersection of AI, high-performance computing, and distributed systems. They’re scaling advanced GPU environments powering demanding workloads for globally recognised technology platforms and emerging digital ecosystems.

With major growth underway, the team is investing heavily in next-generation GPU infrastructure and high-performance compute environments. Infrastructure engineering sits at the core of the company’s long-term direction.

The Role

We’re looking for an experienced Infrastructure Engineer with expertise across large-scale compute, GPU, or high-performance infrastructure environments. This role offers the opportunity to own advanced infrastructure platforms spanning automation, scalability, observability, and operational resilience in a highly technical environment.

You’ll likely come from teams operating at significant scale, where reliability and performance are mission critical.

Responsibilities:

Own the lifecycle management of large-scale GPU infrastructure, from provisioning and firmware validation through to operational reliability.
Lead operations across high-density, liquid-cooled compute environments supporting next-generation AI workloads.
Build automated observability and remediation systems using Prometheus, Grafana, NVIDIA DCGM, and infrastructure automation tooling.
Drive NetBox DCIM integration, asset management, IPAM, and infrastructure compliance across complex compute environments.
Act as a senior technical lead for infrastructure operations, incident response, vendor management, and enterprise-level infrastructure support.

Requirements:

Strong experience managing large-scale GPU, HPC, or high-performance compute infrastructure.
Deep hands-on expertise with NVIDIA GPU systems, including H200, B200, or B300 environments.
Advanced knowledge of InfiniBand, NVLink, NVSwitch, and high-throughput networking architectures.
Strong Linux systems engineering background with infrastructure automation using Python or Go.
Experience with observability and monitoring tooling including Prometheus, Grafana, NVIDIA DCGM, and SNMP.
Proven experience across bare-metal provisioning, infrastructure lifecycle management, and automated/self-healing systems.

Nice to have:

Experience with liquid-cooled or high-density compute environments.
Familiarity with NVIDIA Mission Control and GPU cluster management.
Exposure to confidential compute technologies and attestation. workflows.
Experience building infrastructure standards in fast-scaling environments.

The Offer

Competitive salary and benefits package.
Opportunity to build next-generation AI infrastructure.
Exposure to cutting-edge GPU and HPC environments.
Strong ownership across infrastructure and automation.
Engineering-led culture working on mission-critical systems.

For engineers passionate about large-scale infrastructure, high-performance compute, and automation, this is a rare opportunity to work on next-generation AI infrastructure.

To apply, please submit your application via the advert or contact Andrew directly at andrew@axiomrecruit.com.

Andrew Phillips Founder

Apply for this role

First Name

Last Name

Telephone Number

Email Address

CV, LinkedIn or Dropbox URL

CV Upload

Choose File

LinkedIn / Dropbox URL

Message

By submitting this form you agree to our Terms & Conditions, Privacy Policy & Cookie Policy.

Not yet registered? Create an account today

Already have an account? Sign in now

Recruitment

Still Looking? What about.....

View All

Quick CV Drop Off

Lead GPU Infrastructure Engineer (HPC / AI Infrastructure)

Apply for this role

Still Looking? What about.....

Let’s shape the What’s Next.

Contact Us

Find us on social

Useful Links

Specialisms

Lead GPU Infrastructure Engineer (HPC / AI Infrastructure)

Apply for this role

Still Looking? What about.....

Let’s shape the What’s Next.

Contact Us

Find us on social

Useful Links

Specialisms

Sign up to our newsletter