Description
As a **Data Engineer** supporting **Law data strategy** , you will design, build, and maintain scalable data pipelines that integrate data from legal systems into Amgen’s **enterprise data fabric** .
You will enable high-quality, governed datasets that support **analytics, reporting, and emerging AI/ML use cases** for Legal and Compliance teams.
This role requires strong hands-on engineering skills, familiarity with modern data platforms (e.g., Databricks), and the ability to work closely with Legal stakeholders, Data Architects, and AI/Analytics teams.
**Key Responsibilities**
**Data Engineering & Pipeline Development**
+ Design, develop, and maintain data pipelines to ingest data from **legal systems, third-party tools, and enterprise platforms**
+ Build and optimize **ETL/ELT pipelines** using modern frameworks (Databricks, Spark)
+ Implement reliable, scalable, and production-ready data pipelines using engineering best practices, monitoring, and automated validation frameworks
+ Integrate structured and unstructured legal data into the **enterprise data fabric**
+ Ensure reliability, scalability, and performance of data pipelines
**Databricks & Modern Data Platform**
+ Develop pipelines using **Databricks (Delta Lake, Spark, notebooks)**
+ Implement data transformation and orchestration workflows
+ Support migration and modernization of legacy data solutions to cloud-native platforms
+ Contribute to reusable data engineering patterns and components
+ Optimize Delta Lake and Spark workloads for scalable, cost-efficient, and high-performance enterprise data processing
**Data Quality, Governance & Compliance**
+ Implement data quality checks, validation rules, and monitoring
+ Implement governance, lineage, and security controls for sensitive legal and compliance datasets
+ Ensure compliance with **data governance, privacy, and legal/regulatory requirements** (e.g., sensitive legal data handling)
+ Maintain metadata, lineage, and documentation for legal datasets
**AI & Advanced Analytics Enablement**
+ Build curated datasets that support **AI/ML models and GenAI use cases**
+ Prepare structured and unstructured datasets for AI/ML and GenAI use cases including document intelligence and semantic search applications
+ Enable feature engineering and data preparation for AI applications in Legal (e.g., document analysis, contract insights)
+ Collaborate with data scientists and AI teams to ensure data readiness and accessibility
**Collaboration & Delivery**
+ Work with Legal stakeholders to understand data needs and translate into technical solutions
+ Partner with Data Architects to align with enterprise data fabric strategy
+ Participate in Agile development processes (sprint planning, estimation, delivery)
+ Document pipelines, models, and technical decisions
**Basic Qualifications**
+ Master’s or Bachelor’s degree in Computer Science, Engineering, Information Systems, or related field
+ **5–8 years** of experience in data engineering or related technical role
**Must-Have Technical Skills**
+ Strong experience with **SQL** and relational databases
+ Programming experience in **Python (required), PySpark preferred**
+ Hands-on experience with **Databricks / Apache Spark**
+ Experience building **ETL/ELT pipelines** for large-scale datasets
+ Familiarity with **cloud platforms (AWS, Azure, or GCP)**
+ Understanding of **data modeling and data warehousing concepts**
**Preferred / Strategic Skills (Aligned to Future Data Strategy)**
+ Certification:
+ Relevant certifications in Databricks, cloud platforms (AWS/Azure/GCP), or modern data engineering technologies are a plus
+ Experience with:
+ **Delta Lake / Lakehouse architectures**
+ **Data Fabric / Data Mesh concepts**
+ **Snowflake, Redshift, or enterprise data warehouse platforms**
+ Familiarity with:
+ **Streaming data (Kafka, event-driven pipelines)**
+ **Data orchestration tools (Airflow, Databricks Workflows)**
+ Exposure to:
+ **AI/ML data pipelines and feature engineering**
+ **Unstructured data processing (documents, legal text)**
+ Understanding of:
+ **Data governance frameworks and cataloging tools**
+ **Security and privacy controls for sensitive data (legal/compliance)**
**Functional Skills**
+ Strong problem-solving and analytical thinking
+ Ability to work with large, complex datasets
+ Effective communication with both technical and non-technical stakeholders
+ Ability to operate in a fast-paced Agile environment





