Description

As a **Data Engineer** supporting **Law data strategy** , you will design, build, and maintain scalable data pipelines that integrate data from legal systems into Amgen’s **enterprise data fabric** .

You will enable high-quality, governed datasets that support **analytics, reporting, and emerging AI/ML use cases** for Legal and Compliance teams.

This role requires strong hands-on engineering skills, familiarity with modern data platforms (e.g., Databricks), and the ability to work closely with Legal stakeholders, Data Architects, and AI/Analytics teams.

**Key Responsibilities**

**Data Engineering & Pipeline Development**

+ Design, develop, and maintain data pipelines to ingest data from **legal systems, third-party tools, and enterprise platforms**

+ Build and optimize **ETL/ELT pipelines** using modern frameworks (Databricks, Spark)

+ Implement reliable, scalable, and production-ready data pipelines using engineering best practices, monitoring, and automated validation frameworks

+ Integrate structured and unstructured legal data into the **enterprise data fabric**

+ Ensure reliability, scalability, and performance of data pipelines

**Databricks & Modern Data Platform**

+ Develop pipelines using **Databricks (Delta Lake, Spark, notebooks)**

+ Implement data transformation and orchestration workflows

+ Support migration and modernization of legacy data solutions to cloud-native platforms

+ Contribute to reusable data engineering patterns and components

+ Optimize Delta Lake and Spark workloads for scalable, cost-efficient, and high-performance enterprise data processing

**Data Quality, Governance & Compliance**

+ Implement data quality checks, validation rules, and monitoring

+ Implement governance, lineage, and security controls for sensitive legal and compliance datasets

+ Ensure compliance with **data governance, privacy, and legal/regulatory requirements** (e.g., sensitive legal data handling)

+ Maintain metadata, lineage, and documentation for legal datasets

**AI & Advanced Analytics Enablement**

+ Build curated datasets that support **AI/ML models and GenAI use cases**

+ Prepare structured and unstructured datasets for AI/ML and GenAI use cases including document intelligence and semantic search applications

+ Enable feature engineering and data preparation for AI applications in Legal (e.g., document analysis, contract insights)

+ Collaborate with data scientists and AI teams to ensure data readiness and accessibility

**Collaboration & Delivery**

+ Work with Legal stakeholders to understand data needs and translate into technical solutions

+ Partner with Data Architects to align with enterprise data fabric strategy

+ Participate in Agile development processes (sprint planning, estimation, delivery)

+ Document pipelines, models, and technical decisions

**Basic Qualifications**

+ Master’s or Bachelor’s degree in Computer Science, Engineering, Information Systems, or related field

+ **5–8 years** of experience in data engineering or related technical role

**Must-Have Technical Skills**

+ Strong experience with **SQL** and relational databases

+ Programming experience in **Python (required), PySpark preferred**

+ Hands-on experience with **Databricks / Apache Spark**

+ Experience building **ETL/ELT pipelines** for large-scale datasets

+ Familiarity with **cloud platforms (AWS, Azure, or GCP)**

+ Understanding of **data modeling and data warehousing concepts**

**Preferred / Strategic Skills (Aligned to Future Data Strategy)**

+ Certification:

+ Relevant certifications in Databricks, cloud platforms (AWS/Azure/GCP), or modern data engineering technologies are a plus

+ Experience with:

+ **Delta Lake / Lakehouse architectures**

+ **Data Fabric / Data Mesh concepts**

+ **Snowflake, Redshift, or enterprise data warehouse platforms**

+ Familiarity with:

+ **Streaming data (Kafka, event-driven pipelines)**

+ **Data orchestration tools (Airflow, Databricks Workflows)**

+ Exposure to:

+ **AI/ML data pipelines and feature engineering**

+ **Unstructured data processing (documents, legal text)**

+ Understanding of:

+ **Data governance frameworks and cataloging tools**

+ **Security and privacy controls for sensitive data (legal/compliance)**

**Functional Skills**

+ Strong problem-solving and analytical thinking

+ Ability to work with large, complex datasets

+ Effective communication with both technical and non-technical stakeholders

+ Ability to operate in a fast-paced Agile environment