Job Location: Mumbai, Hyderabad/Secunderabad, Gurgaon/Gurugram, Bangalore/Bengaluru
Work you’ll do :
– Translate functional requirements into technical design
– Recommend design alternatives for data ingestion, processing and provisioning layers
– Design and develop data ingestion programs to process large data sets in Batch mode using HIVE, Pig and Sqoop technologies
– Design and develop data integration programs using commercial ETL tools such as Informatica, Data Stage and SnapLogic
– Design and develop data integration programs using open source and open standard ETL tools such as TalenD and Pentaho Kettle
– Develop data ingestion programs to ingest realtime data from LIVE sources using Apache Kafka, Spark Streaming and related technologies
– Work in large teams developing and delivering solutions to support large scale data management platforms following Agile methodology
– Monitor data ingestion processes end to end and optimize the overall data processing lead times
– Develop test scenarios and test scripts to validate data loaded in Hadoop platform
Qualifications :
Required :
– 3-6 Years of technology Consulting experience
– Education: Bachelors / Master’s degree in Computer Science MCA M.Sc MBA
– A minimum of 2 Years of experience designing and developing Big Data Solutions on Hadoop technology platform
– Ability to translate business requirements and technical requirements into technical design
– Good knowledge of end to end project delivery methodology implementing Big Data projects
– Deep technical understanding of distributed computing and broader awareness of different Hadoop distributions
– Experience designing Solution architectures for different use cases with Hadoop and ecosystem tools
– Strong UNIX operating system concepts and shell scripting knowledge
– Hands-on experience using Hadoop (preferably Hadoop 2 with YARN), MapReduce, R, Pig, Hive, Sqoop, and HBase
– Exposure to search tools such as Elastic Search and Lucene
– Extensive Experience in object-oriented programming through JAVA, including optimizing memory usage and JVM configuration in distributed programming environments
– Exposure to metadata management techniques within Hadoop technology architecture
– Ability to operate independently with clear focus on schedule and outcomes
– Proficient with algorithm development, including statistical and probabilistic analysis, clustering, recommendation systems, natural language processing, and performance analysis
– Understanding of machine learning frameworks like Mahout and data mining algorithms like Bayesian and Clustering
– Experience with building APIs for provisioning data to downstream systems by leveraging different frameworks
Preferred :
– Production Experience in Apache Spark using SparkSQL and Spark Streaming or Apache Storm
– Exposure to different NoSQL databases within Hadoop ecosystem
– Exposure to public, private, and hybrid cloud platforms such as AWS, Azure and Google.
Submit CV To All Data Science Job Consultants Across Bharat For Free

