Principal Site Reliability Engineer
Core42 · Abou Dabi
Job description
About the role
Core42 is looking for a Principal Site Reliability Engineer to lead the design and evolution of its globally distributed infrastructure that powers AI and private‑cloud workloads. This senior technical leader will shape platform strategy, drive automation, and ensure the reliability of high‑performance, GPU‑intensive systems.
Key responsibilities
- Define and execute the long‑term roadmap for infrastructure, CI/CD, and Kubernetes platforms.
- Design scalable, distributed systems for AI/ML and HPC workloads.
- Implement AI‑driven automation, self‑healing workflows, and predictive AIOps capabilities.
- Architect high‑performance, multi‑tenant Kubernetes environments with GPU support.
- Build observability platforms, set SLOs/SLIs, and lead root‑cause analysis.
- Act as escalation point for complex incidents and mentor SRE/DevOps teams.
- Collaborate with product, engineering, and senior leadership to align reliability with business goals.
Required profile
- 10+ years of experience in Site Reliability Engineering, Platform Engineering, or Systems Architecture.
- Proven track record designing and operating large‑scale distributed systems.
- Deep expertise with Kubernetes (EKS, GKE, or bare‑metal) and GPU‑intensive workloads.
- Strong programming skills in Python, Go, or Rust.
- Extensive experience with Terraform, Helm, and infrastructure‑as‑code practices.
Required skills
- Kubernetes
- EKS / GKE / bare‑metal clusters
- GPU and HPC workload orchestration
- Python, Go, Rust
- Terraform
- Helm
- CI/CD pipelines
- Observability (metrics, logs, tracing)
- SLO / SLI definition
- AIOps and automation frameworks
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 1 day ago
Expires 1 month from now
6 views · 0 applications
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Core42
Abou Dabi
Related job offers
-
Software Engineer – AI Platform (Hybrid UAE or Remote)
AW Connect Abou Dabi -
DevOps Engineer – Azure & Kubernetes (UAE)
Mphasis Abou Dabi -
Full Stack Developer (Remote)
YO IT Consulting Abou Dabi -
Entry-Level IT Specialist – Odoo Focus
H&R Real Estate Brokerage | Arm of Amer Al Ghurair Group Doubaï -
President – AI Acquisition
AI Acquisition Doubaï