Sr Big Data Engineer - Oozie and Pig (GCP)
United States - Remote••April 14, 2025
About the Role We are seeking a Senior Big Data Engineer with deep expertise in distributed systems, batch data processing, and large-scale data pipelines. The ideal candidate has strong hands-on experience with Oozie, Pig, the Apache Hadoop ecosystem, and programming proficiency in Java (preferred) or Python. This role requires a deep understanding of data structures and algorithms, along with a proven track record of writing production-grade code and building robust data workflows. This is a fully remote position and requires an independent, self-driven engineer who thrives in complex technical environments and communicates effectively across teams. Work Location: US-Remote, Canada-Remote
Key Responsibilities:
- Design and develop scalable batch processing systems using technologies like Hadoop, Oozie, Pig, Hive, MapReduce, and HBase, with hands-on coding in Java or Python.
- Write clean, efficient, and production-ready code with a strong focus on data structures and algorithmic problem-solving applied to real-world data engineering tasks.
- Develop, manage, and optimize complex data workflows within the Apache Hadoop ecosystem, with a strong focus on Oozie orchestration and job scheduling.
- Leverage Google Cloud Platform (GCP) tools such as Dataproc, GCS, and Composer to build scalable and cloud-native big data solutions.
- Implement DevOps and automation best practices, including CI/CD pipelines, infrastructure as code (IaC), and performance tuning across distributed systems.
- Collaborate with cross-functional teams to ensure data pipeline reliability, code quality, and operational excellence in a remote-first environment.
Qualifications:
- Bachelors's degree in Computer Science, software engineering or related field of study.
- Experience with managed cloud services and understanding of cloud-based batch processing systems are critical.
- Proficiency in Oozie, Airflow, Map Reduce, Java.
- Strong programming skills with Java (specifically Spark), Python, Pig, and SQL.
- Expertise in public cloud services, particularly in GCP.
- Proficiency in the Apache Hadoop ecosystem with Oozie, Pig, Hive, Map Reduce.
- Familiarity with BigTable and Redis.
- Experienced in Infrastructure and Applied DevOps principles in daily work. Utilize tools for continuous integration and continuous deployment (CI/CD), and Infrastructure as Code (IaC) like Terraform to automate and improve development and release processes.
- Proven experience in engineering batch processing systems at scale.
Must Have: (Important)
- 5+ years of experience in customer-facing software/technology or consulting.
- 5+ years of experience with “on-premises to cloud” migrations or IT transformations.
- 5+ years of experience building, and operating solutions built on GCP
- Proficiency in Oozie andPig
- Proficiency in Java or Python