MilliporeSigma | Jobs | Lead Data Engineer | BigDataKB.com

Job Location: Bangalore/Bengaluru

In this role, you will be part of a growing, global team of data engineers, who collaborate in DevOps mode, to enable Life Science business with state-of-the-art technology to leverage data as an asset and to take better informed decisions.

The Life Science Data Engineering Team is responsible for designing, developing, testing, and supporting automated end-to-end data pipelines and applications on Life Science s data management and analytics platform (Palantir Foundry, Hadoop, and other components).

The Foundry platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or on-premises organisation s own data centers. Developing pipelines and applications on Foundry requires:

Proficiency in SQL / Java / Python (Python required; all 3 not necessary)

Proficiency in PySpark for distributed computation

Familiarity with Postgres and ElasticSearch

Familiarity with HTML, CSS, and JavaScript and basic design/visual competency

Familiarity with common databases (e.g., JDBC, mySQL, Microsoft SQL). Not all types required

This position will be project based and may work across multiple smaller projects or a single large project utilizing an agile project methodology.

Roles Responsibilities:

Develop data pipelines by ingesting various data sources structured and un-structured into Palantir Foundry

Participate in end-to-end project lifecycle, from requirements analysis to go-live and operations of an application

Acts as business analyst for developing requirements for Foundry pipelines

Review code developed by other data engineers and check against platform-specific standards, cross-cutting concerns, coding and configuration standards and functional specification of the pipeline

Document technical work in a professional and transparent way. Create high quality technical documentation

Work out the best possible balance between technical feasibility and business requirements (the latter can be quite strict)

Deploy applications on Foundry platform infrastructure with clearly defined checks

Implementation of changes and bug fixes via organisations change management framework and according to system engineering practices (additional training will be provided)

DevOps project setup following Agile principles (e.g., Scrum)

Besides working on projects, act as third level support for critical applications; analyze and resolve complex incidents/problems. Debug problems across a full stack of Foundry and code based on Python, Pyspark, and Java

Work closely with business users, data scientists/analysts to design physical data models

Who you are:

Education

Bachelor (or higher) degree in Computer Science, Engineering, Mathematics, Physical Sciences, or related fields

Professional Experience

5+ years of experience in system engineering or software development

3+ years of experience in engineering with experience in ETL type work with databases and Hadoop platforms.

Skills

Hadoop General

Deep knowledge of distributed file system concepts, map-reduce principles, and distributed computing.
Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.

Data management / data structures

Must be proficient in technical data management tasks, i.e., writing code to read, transform and store data
XML/JSON knowledge
Experience working with REST APIs

Spark

Experience in launching spark jobs in client mode and cluster mode. Familiarity with the property settings of spark jobs and their implications to performance.

Application Development

Familiarity with HTML, CSS, and JavaScript and basic design/visual competency

SCC/Git

Must be experienced in the use of source code control systems such as Git

ETL

Experience with developing ELT/ETL processes with experience in loading data from enterprise sized RDBMS systems such as Oracle, DB2, MySQL, etc.

Authorization

Basic understanding of user authorization (Apache Ranger preferred)

Programming

Must be at able to code in Python or expert in at least one high level language such as Java, C, Scala.
Must have experience in using REST APIs

SQL

Must be an expert in manipulating database data using SQL. Familiarity with views, functions, stored procedures and exception handling.

AWS

General knowledge of AWS Stack (EC2, S3, EBS, )

IT Process Compliance

SDLC experience and formalized change controls
Working in DevOps teams, based on Agile principles (e.g., Scrum)
ITIL knowledge (especially incident, problem and change management)

Languages

Fluent English skills

Specific information related to the position:

Physical presence in primary work location (Bangalore)

Flexible to work CEST and US EST time zones (according to team rotation plan)

Willingness to travel to Germany, US, and potentially other locations (as per project demand)

Apply Here

Submit CV To All Data Science Job Consultants Across India For Free

🔍 Explore All Related ITSM Jobs Below! 🚀 ✅ Select your preferred “Job Category” in the Job Category Filter 🎯 🔎 Hit “Search” to find matching jobs 🔥 ➕ Click the “+” icon that appears just before the company name to see the Job Detail & Apply Link 📝💼

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

MilliporeSigma | Jobs | Lead Data Engineer | BigDataKB.com | 07-02-22

Comments

Leave a Reply Cancel reply