MilliporeSigma | Jobs | Lead Data Engineer | BigDataKB.com | 07-02-22

โ€”

by

Job Location: Bangalore/Bengaluru

In this role, you will be part of a growing, global team of data engineers, who collaborate in DevOps mode, to enable Life Science business with state-of-the-art technology to leverage data as an asset and to take better informed decisions.

The Life Science Data Engineering Team is responsible for designing, developing, testing, and supporting automated end-to-end data pipelines and applications on Life Science s data management and analytics platform (Palantir Foundry, Hadoop, and other components).

The Foundry platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or on-premises organisation s own data centers. Developing pipelines and applications on Foundry requires:

  • Proficiency in SQL / Java / Python (Python required; all 3 not necessary)
  • Proficiency in PySpark for distributed computation
  • Familiarity with Postgres and ElasticSearch
  • Familiarity with HTML, CSS, and JavaScript and basic design/visual competency
  • Familiarity with common databases (e.g., JDBC, mySQL, Microsoft SQL). Not all types required

This position will be project based and may work across multiple smaller projects or a single large project utilizing an agile project methodology.

Roles Responsibilities:

  • Develop data pipelines by ingesting various data sources structured and un-structured into Palantir Foundry
  • Participate in end-to-end project lifecycle, from requirements analysis to go-live and operations of an application
  • Acts as business analyst for developing requirements for Foundry pipelines
  • Review code developed by other data engineers and check against platform-specific standards, cross-cutting concerns, coding and configuration standards and functional specification of the pipeline
  • Document technical work in a professional and transparent way. Create high quality technical documentation
  • Work out the best possible balance between technical feasibility and business requirements (the latter can be quite strict)
  • Deploy applications on Foundry platform infrastructure with clearly defined checks
  • Implementation of changes and bug fixes via organisations change management framework and according to system engineering practices (additional training will be provided)
  • DevOps project setup following Agile principles (e.g., Scrum)
  • Besides working on projects, act as third level support for critical applications; analyze and resolve complex incidents/problems. Debug problems across a full stack of Foundry and code based on Python, Pyspark, and Java
  • Work closely with business users, data scientists/analysts to design physical data models

Who you are:

Education

  • Bachelor (or higher) degree in Computer Science, Engineering, Mathematics, Physical Sciences, or related fields

Professional Experience

  • 5+ years of experience in system engineering or software development
  • 3+ years of experience in engineering with experience in ETL type work with databases and Hadoop platforms.

Skills

Hadoop General

  • Deep knowledge of distributed file system concepts, map-reduce principles, and distributed computing.
  • Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.

Data management / data structures

  • Must be proficient in technical data management tasks, i.e., writing code to read, transform and store data
  • XML/JSON knowledge
  • Experience working with REST APIs

Spark

  • Experience in launching spark jobs in client mode and cluster mode. Familiarity with the property settings of spark jobs and their implications to performance.

Application Development

  • Familiarity with HTML, CSS, and JavaScript and basic design/visual competency

SCC/Git

  • Must be experienced in the use of source code control systems such as Git

ETL

  • Experience with developing ELT/ETL processes with experience in loading data from enterprise sized RDBMS systems such as Oracle, DB2, MySQL, etc.

Authorization

  • Basic understanding of user authorization (Apache Ranger preferred)

Programming

  • Must be at able to code in Python or expert in at least one high level language such as Java, C, Scala.
  • Must have experience in using REST APIs

SQL

  • Must be an expert in manipulating database data using SQL. Familiarity with views, functions, stored procedures and exception handling.

AWS

  • General knowledge of AWS Stack (EC2, S3, EBS, )

IT Process Compliance

  • SDLC experience and formalized change controls
  • Working in DevOps teams, based on Agile principles (e.g., Scrum)
  • ITIL knowledge (especially incident, problem and change management)

Languages

Fluent English skills

Specific information related to the position:

  • Physical presence in primary work location (Bangalore)
  • Flexible to work CEST and US EST time zones (according to team rotation plan)
  • Willingness to travel to Germany, US, and potentially other locations (as per project demand)

Apply Here

Submit CV To All Data Science Job Consultants Across India For Free

๐Ÿ” Explore All Related ITSM Jobs Below! ๐Ÿš€ โœ… Select your preferred “Job Category” in the Job Category Filter ๐ŸŽฏ ๐Ÿ”Ž Hit “Search” to find matching jobs ๐Ÿ”ฅ โž• Click the “+” icon that appears just before the company name to see the Job Detail & Apply Link ๐Ÿ“๐Ÿ’ผ

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *