MilliporeSigma | Jobs | Lead Data Engineer | BigDataKB.com | 07-02-22

    0

    Job Location: Bangalore/Bengaluru

    In this role, you will be part of a growing, global team of data engineers, who collaborate in DevOps mode, to enable Life Science business with state-of-the-art technology to leverage data as an asset and to take better informed decisions.

    The Life Science Data Engineering Team is responsible for designing, developing, testing, and supporting automated end-to-end data pipelines and applications on Life Science s data management and analytics platform (Palantir Foundry, Hadoop, and other components).

    The Foundry platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or on-premises organisation s own data centers. Developing pipelines and applications on Foundry requires:

    • Proficiency in SQL / Java / Python (Python required; all 3 not necessary)
    • Proficiency in PySpark for distributed computation
    • Familiarity with Postgres and ElasticSearch
    • Familiarity with HTML, CSS, and JavaScript and basic design/visual competency
    • Familiarity with common databases (e.g., JDBC, mySQL, Microsoft SQL). Not all types required

    This position will be project based and may work across multiple smaller projects or a single large project utilizing an agile project methodology.

    Roles Responsibilities:

    • Develop data pipelines by ingesting various data sources structured and un-structured into Palantir Foundry
    • Participate in end-to-end project lifecycle, from requirements analysis to go-live and operations of an application
    • Acts as business analyst for developing requirements for Foundry pipelines
    • Review code developed by other data engineers and check against platform-specific standards, cross-cutting concerns, coding and configuration standards and functional specification of the pipeline
    • Document technical work in a professional and transparent way. Create high quality technical documentation
    • Work out the best possible balance between technical feasibility and business requirements (the latter can be quite strict)
    • Deploy applications on Foundry platform infrastructure with clearly defined checks
    • Implementation of changes and bug fixes via organisations change management framework and according to system engineering practices (additional training will be provided)
    • DevOps project setup following Agile principles (e.g., Scrum)
    • Besides working on projects, act as third level support for critical applications; analyze and resolve complex incidents/problems. Debug problems across a full stack of Foundry and code based on Python, Pyspark, and Java
    • Work closely with business users, data scientists/analysts to design physical data models

    Who you are:

    Education

    • Bachelor (or higher) degree in Computer Science, Engineering, Mathematics, Physical Sciences, or related fields

    Professional Experience

    • 5+ years of experience in system engineering or software development
    • 3+ years of experience in engineering with experience in ETL type work with databases and Hadoop platforms.

    Skills

    Hadoop General

    • Deep knowledge of distributed file system concepts, map-reduce principles, and distributed computing.
    • Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.

    Data management / data structures

    • Must be proficient in technical data management tasks, i.e., writing code to read, transform and store data
    • XML/JSON knowledge
    • Experience working with REST APIs

    Spark

    • Experience in launching spark jobs in client mode and cluster mode. Familiarity with the property settings of spark jobs and their implications to performance.

    Application Development

    • Familiarity with HTML, CSS, and JavaScript and basic design/visual competency

    SCC/Git

    • Must be experienced in the use of source code control systems such as Git

    ETL

    • Experience with developing ELT/ETL processes with experience in loading data from enterprise sized RDBMS systems such as Oracle, DB2, MySQL, etc.

    Authorization

    • Basic understanding of user authorization (Apache Ranger preferred)

    Programming

    • Must be at able to code in Python or expert in at least one high level language such as Java, C, Scala.
    • Must have experience in using REST APIs

    SQL

    • Must be an expert in manipulating database data using SQL. Familiarity with views, functions, stored procedures and exception handling.

    AWS

    • General knowledge of AWS Stack (EC2, S3, EBS, )

    IT Process Compliance

    • SDLC experience and formalized change controls
    • Working in DevOps teams, based on Agile principles (e.g., Scrum)
    • ITIL knowledge (especially incident, problem and change management)

    Languages

    Fluent English skills

    Specific information related to the position:

    • Physical presence in primary work location (Bangalore)
    • Flexible to work CEST and US EST time zones (according to team rotation plan)
    • Willingness to travel to Germany, US, and potentially other locations (as per project demand)

    Apply Here

    Submit CV To All Data Science Job Consultants Across India For Free

    🔍 Explore All Related ITSM Jobs Below! 🚀 ✅ Select your preferred "Job Category" in the Job Category Filter 🎯 🔎 Hit "Search" to find matching jobs 🔥 ➕ Click the "+" icon that appears just before the company name to see the Job Detail & Apply Link 📝💼

    LEAVE A REPLY

    Please enter your comment!
    Please enter your name here