Job Location: Bangalore
Build, deploy and maintain state-of-the-art pipelines for our machine learning models. Monitor the production pipelines and support working model. Identify, report and fix job failures ASAP, working closely with Hadoop team. Optimize our production pipelines to practical, scalable, time and memory efficient, reliable applications.
Must be familiar with airflow and CDP 3.0 Hadoop applications/ operations. Strong coding skills in Python and SQL. Able to work with technology operation center, on call, and work through in weekends.
1. Degree in Computer Science, Engineering, Mathematics, Data Science, Analytics, Information System, or related quantitative field.
2. 3 + years hand-on experience in building robust, reliable, and scalable machine learning pipelines from ground up as well as transitioning from MVP to production (ingestion, scheduling, security, notifications, validation, backups, optimizations).
3. Experience with CI/CD pipelines such as Jenkins.
4. Experience with distributed systems (Spark, Hive, HDFS, Hadoop, HBase, Druid, Cloudera, Kafka) along with Airflow to orchestrate data pipeline.
5. Experience with process optimizations and fine-tuning existing applications in distributed computing environment.
6. Write quality code in Python/ Pyspark followed by unit tests and documentation.
7. Experience in configuring/ building data quality frameworks – PyDeequ, great expectations or any other tools to increase reliability.
8. Define, execute, and operate monitoring and alerting steps over critical SLA’s – (Prometheus/Nagios, Grafana).
9. Out-of-the box thinker and deep diver to provide efficient solutions to problems
10. Agile development skills and experience.
11. DevOps exposure.
12. Data Science knowledge.