Description

**Role Summary**

+ Build and operate large-scale healthcare data pipelines across batch workflows, metadata-driven ingestion, and data service publishing.

+ Own end-to-end engineering from source ingestion to conformed data products, with strong focus on reliability, data quality, and operational observability.

+ Partner with analytics, business, and platform teams to deliver trusted datasets for sales, claims, activity, patient, and rare disease use cases.

**Key Responsibilities**

+ Design and maintain PySpark/SQL pipelines in Databricks for landing, unified, unstitched, and published data layers.

+ Build and support Airflow DAGs for scheduling, dependencies, retries, and production operations.

+ Implement metadata/config-driven frameworks for ingestion, transformation, and rule-based processing.

+ Develop robust data quality controls, DQ summaries, failure handling, and alerting workflows.

+ Manage batch/process audit logs, run status tracking, release flags, and operational reporting.

+ Integrate multi-source data (files, APIs, cloud storage, and relational systems) into governed Delta/Spark tables.

+ Optimize pipeline performance using partitioning, parallelization, and query tuning.

+ Collaborate on schema evolution, business-rule onboarding, and production support.

**Required Skills**

+ Bachelor’s degree in Computer Science, Information Technology, or a related field with 2-6 years of experience

+ Advanced Python, PySpark, and SQL (window functions, complex joins, MERGE patterns, optimization).

+ Hands-on Databricks and Airflow experience in enterprise environments.

+ Experience with cloud data platforms (AWS), object storage, and secure secret handling.

+ Strong data quality engineering, monitoring, and troubleshooting in regulated data contexts.

+ Solid understanding of ETL orchestration, dependency management, and SLA-driven delivery.

Share on LinkedInShare on FacebookShare on Google+Pin on PinterestEmail this to someone