Domain Crunchers | Hiring | Senior Data Engineer – Python/ Java | BigDataKB.com | 44831

0

Job Location: Delhi

Technical Skills :

Languages – Python, SQL, Java, HCL, HTML/CSS/Javascript, Bash

Database Technology – Spark, SybaseIQ, DB/2, Snowflake, Redshift, Hive, Presto, Oracle PL/SQL

Tools – AWS, Terraform, Kubernetes, Docker, Jupyter, Intellij, vim, Git, SVN, Apache, nginx, Splunk, SSH

– Primarily should have worked on the Data Lake, a petabyte-scale Data Warehouse with unique requirements. The lake is used across hundreds of teams for many time-sensitive critical applications.

– Derive a variety of SLOs and health indicators for the lake. Successfully optimized the lake, bringing ingestion time down under 15 minutes for more than 90% of users.

– Designed an event-driven near real-time SLO monitor for the lake that processes millions of events a minute.

– Craft terraform AWS configurations from scratch to deploy key lake components to the cloud.

– Develop and maintain a Jupyter notebook ecosystem on Kubernetes to support the SRE team.

– Write Jupyter notebooks to analyze telemetry metrics, develop insights, and establish SLOs. Notebooks typically pulled in data using SQL or Pyspark, and further processed in Pandas. Visualizations were done using matplotlib.

– Designed an automation framework for Jupyter notebooks to schedule, cache, serve, and email them to clients.

– Implemented and maintained Prometheus metrics for high-level monitoring of the lake. These metrics are pushed to Grafana for visualization and Pagerduty for alerting.

– Developed on Facebook’s Hadoop system through Hive and Presto, using Facebook’s internal ETL framework.

– Maintained solutions with third parties for ad data ingestion and delivery, including coordination of data definitions and validation checks during ETL process.

– Created APIs using hack (PHP) for upload endpoints.

– Developed dashboards for sales lift data normalized across third parties using Tableau and internal tools.

– Maintained ETL processes to solve bugs, data quality issues, CPU and space optimization, and adding columns to tables, which were mainly core ad metrics data sets that had a wide impact across the company.

– Developed Facebook status tables which was a dataset that exceeded 150TB and over 1.2 trillion rows, from Facebook’s graph structure and curated into an easily digestible hive table, used by research teams for insights, sentiment analysis, and machine learning applications.

Apply Here

Submit CV To All Data Science Job Consultants Across Bharat For Free

🔍 Explore All Related ITSM Jobs Below! 🚀 ✅ Select your preferred "Job Category" in the Job Category Filter 🎯 🔎 Hit "Search" to find matching jobs 🔥 ➕ Click the "+" icon that appears just before the company name to see the Job Detail & Apply Link 📝💼

LEAVE A REPLY

Please enter your comment!
Please enter your name here