Description
**Role Description:**
Let’s do this. Let’s change the world. We are looking for highly motivated expert Senior Data Engineer who can own the design & development of complex data pipelines, solutions and frameworks with detailed functional knowledge of R&D. The ideal candidate will be responsible to design, develop, and optimize data pipelines, data integration frameworks, and metadata-driven architectures that enable seamless data access and analytics. This role prefers deep expertise in big data processing, distributed computing, data modeling, and governance frameworks to support self-service analytics, AI-driven insights, and enterprise-wide data management.
**Roles & Responsibilities:**
+ Design, develop, and maintain scalable ETL/ELT pipelines to support structured, semi-structured, and unstructured data processing across the Enterprise Data Engineering for Biotech or Pharma functional knowledge of R&D.
+ Implement real-time and batch data processing solutions, integrating data from multiple sources into a unified, governed data fabric architecture.
+ Optimize big data processing frameworks using Apache Spark, Hadoop, or similar distributed computing technologies to ensure high availability and cost efficiency.
+ Work with metadata management and data lineage tracking tools to enable enterprise-wide data discovery and governance.
+ Ensure data security, compliance, and role-based access control (RBAC) across data environments.
+ Optimize query performance, indexing strategies, partitioning, and caching for large-scale data sets.
+ Develop CI/CD pipelines for automated data pipeline deployments, version control, and monitoring.
+ Implement data virtualization techniques to provide seamless access to data across multiple storage systems.
+ Collaborate with cross-functional teams, including data architects, business analysts, and DevOps teams, to align data engineering strategies with enterprise goals.
+ Stay up to date with emerging data technologies and best practices, ensuring continuous improvement of Enterprise Data Fabric architectures.
**Must-Have Skills:**
+ Hands-on experience in data engineering technologies such as Databricks, PySpark, SparkSQL Apache Spark, AWS, Python, SQL, and Scaled Agile methodologies.
+ Proficiency in workflow orchestration, performance tuning on big data processing.
+ Strong understanding of AWS services
+ Experience with Data Fabric, Data Mesh, or similar enterprise-wide data architectures.
+ Ability to quickly learn, adapt and apply new technologies
+ Strong problem-solving and analytical skills
+ Excellent communication and teamwork skills
+ Experience with Scaled Agile Framework (SAFe), Agile delivery practices, and DevOps practices.
**Good-to-Have Skills:**
+ Good to have deep expertise in Biotech & Pharma industries
+ Experience in writing APIs to make the data available to the consumers
+ Experienced with SQL/NOSQL database, vector database for large language models
+ Experienced with data modeling and performance tuning for both OLAP and OLTP databases
+ Experienced with software engineering best-practices, including but not limited to version control (Git, Subversion, etc.), CI/CD (Jenkins, Maven etc.), automated unit testing, and Dev Ops
**Education and Professional Certifications**
+ Master’s degree and 6 to 8 + years of Computer Science, IT or related field experienceOR
+ Bachelor’s degree and 7 to 10 + years of Computer Science, IT or related field experience
+ AWS Certified Data Engineer preferred
+ Databricks Certificate preferred
+ Scaled Agile SAFe certification preferred
**Soft Skills:**
+ Excellent analytical and troubleshooting skills.
+ Strong verbal and written communication skills
+ Ability to work effectively with global, virtual teams
+ High degree of initiative and self-motivation.
+ Ability to manage multiple priorities successfully.
+ Team-oriented, with a focus on achieving team goals.
+ Ability to learn quickly, be organized and detail oriented.
+ Strong presentation and public speaking skills.
              




