Senior Site Reliability Engineer - GCP Focussed
United States - San Antonio•United StatesNorth AmericaSan AntonioTexasUnited StatesNorth America•April 14, 2025
About the RoleWe are seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in managing large-scale, data-intensive production-grade systems and infrastructure, with deep experience in cloud observability, automation, and reliability engineering at scale. A solid understanding of public cloud services—especially Google Cloud Platform (GCP)—is essential.At the core of this role is the administration and maintenance of cloud infrastructure, including on-call support, monitoring, automation, deployment, the establishment of CI/CD pipelines, and the formulation of reusable cloud infrastructure templates via infrastructure as code (IaC) methodologies. You will apply these SRE principles to design and implement scalable, automated infrastructure supporting ML model training, real-time inference APIs, and analytics workloads across platforms like Vertex AI, BigQuery, and Dataproc. You’ll work closely with ML and data teams to ensure production systems are observable, performant, and fault-tolerant — embedding reliability into every stage of the pipeline.This role involves working in a remote environment, requiring excellent communication skills and the ability to solve complex problems independently and creatively.Work Location: US-Remote, Canada-Remote
Key Responsibilities:
- Administer and optimize cloud-native databases and storage platforms, including Google Cloud Storage (GCS), Cloud SQL, Spanner, and Firestore.
- Support and maintain machine learning and analytics platforms, including Vertex AI, Generative AI, BigQuery, Looker, and Dataproc, ensuring scalable and reliable infrastructure for data pipelines and model workflows.
- Implement and manage cloud observability using OpenTelemetry and native GCP tools to enable real-time monitoring, distributed tracing, and incident resolution.
- Support and maintain large-scale applications, computer systems, and networks in production environments.
- Administer and troubleshoot Linux-based systems, including core networking protocols such as TCP/IP, HTTP, MAIL protocols, DNS, and manage components like content delivery networks (CDNs) and load balancers.
- Manage and operate GCP services, including Kubernetes Engine (GKE), Compute Engine (GCE), Networking, Security, CI/CD pipelines, and other common Cloud technologies.
- Build and maintain cloud infrastructure using Infrastructure as Code (IaC) tools such as Terraform, Ansible, and Helm Charts.
- Develop and deploy services using Python, Golang, or Java, and implement CI/CD pipelines to ensure consistent, reliable delivery of applications and infrastructure components.
Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
- 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering, including hands-on operational support and participation in on-call rotations.
- Proven track record of managing large-scale applications, distributed systems, and networked services in production.
Must Have: (Important)
- Minimum 5+ years of hands-on experience in cloud environments
- Deep understanding of Google Cloud Platform (GCP) — especially GKE, GCE, networking, and security
- Strong troubleshooting and debugging skills across systems and networks
- Cloud-native databases and storage — including Google Cloud Storage (GCS), Cloud SQL, Spanner, and Firestore
- Machine Learning and AI platforms — such as Vertex AI, Generative AI tools, BigQuery, Looker, and DataProc
- Cloud observability and monitoring — hands-on experience with OpenTelemetry, tracing, metrics, and distributed logging systems
Cyber Security Jobs by Category
Cyber Security Jobs by Location
Cyber Security Jobs in United StatesCyber Security Jobs in North AmericaCyber Security Jobs in San AntonioCyber Security Jobs in TexasCyber Security Jobs in United StatesCyber Security Jobs in North America