Job Location: Kolkata
The key responsibility for the role is to design and implement complex data pipelines in the organization platform. These could be on the Cloud using Cloud-native technologies or on-premises using open big data environments.
• Structured design thinking to modularize data pipelines, define re-use opportunities and bring efficient methods to bear on the problem
• Understanding of design choices in building a high volume and scalable pipeline
• Familiarity of data pipeline concepts from analogous tools in the big data ecosystem
• Willingness to learn the organization platform tooling concepts and deploy pipelines into the tool efficiently
• Discipline to apply the standard software development principles in your day-to-day tasks with a readiness to document and test
• Ability to work as an independent team member, capable of applying judgment to plan and execute tasks.
• 2-6 years overall experience in data engineering or data analytics.
• Strong expertise in applying advanced SQL functions for data analysis and implementing complex computations
• Experience in developing and maintaining ETL pipelines, streaming jobs and data extraction using frameworks such as Apache Spark or similar data processing framework.
• Solid understanding of data warehousing, data ingestion, modelling, BI & reporting concepts.
• Ability to troubleshoot and tune performance of Apache Spark jobs operating on large datasets
• Experience with big data distribution frameworks like Cloudera | HD Insight | Data Proc | AWS EMR
• Experience working in hybrid cloud environments like AWS | Google Cloud | Azure
• Understanding of banking products, regulatory datasets and credit risk models
• Having a firm statistical background and exposure to tools like SAS, Python, R etc
• Programming background with good knowledge of either python or java.
Good interpersonal and communication skills, including the ability to inspire, mentor, coach junior data engineers