The Judge Group | Hiring | Data Engineer | Houston, TX | BigDataKB.com | 2022-09-29

0
78

Job Location: Houston, TX

AWS Engineer/Big Data Developer

Responsibilities

• Improvements to our Kubernetes architecture to enable more efficient allocation of nodes

• Improve the performance and ease of use of our Spark and Airflow platforms

• Security hardening of cloud infrastructure

• Internal software systems and self-service tools to enable the creation of secure and compliant AWS resources

• Building our centralized logging and metrics collecting infrastructure

• Open sourcing some of our tools

• CI/CD for applications and data pipelines

• Strong technical leadership and communication skills

• Ability to partner cross-functionally with product, engineering, legal and compliance, and IT

• Evaluate, adopt, and advocate bleeding edge tools that solve current infra problems

• Advocate engineering best practices

• Ability to architect solutions and systems, evaluate their tradeoffs, and provide guidance on architectural decisions

• Deep expertise with AWS services including core Compute, Networking and Storage services

• Understand common application architecture components such as frontend, APIs, databases, caching, queuing, and search.

• Experience with Python

• Experience running Kubernetes in production

• Proactive in identifying problems and developing solutions

• Ability to prioritize and triage unplanned work and ensure the team stays organized

• Proficiency with Python 3, comfortable reviewing code and providing feedback on pull requests

• Understanding of Spark architecture for use in batch processing and ad-hoc analytics

• Understanding of Airflow architecture

• Kubernetes architecture: most of our infrastructure, including Airflow and Spark, is running in k8s

• AWS infrastructure including EC2, RDS, EKS, S3, VPCs, ASGs

• Build the base storage, compute, and networking infrastructure used organization wide

• Build shared infrastructure services and platform such as our Kubernetes (EKS) platform and CI/CD platform (Jenkins, GitHub Actions)

• Ensure cloud governance and security best practices are followed

• Partner with our product teams to build out highly available, scalable, and secure architectures

• Strong technical leadership and communication skills

• Ability to partner cross-functionally with product, engineering, legal and compliance, and IT

Technical Skills

  • Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala No SQL Databases: HBase, Cassandra, Mongo DB
  • Languages: Java, Python, Pyspark, UNIX shell scripts
  • Java/J2EE Technologies : Applets, Swing, JDBC, JNDI, JSON, JSTL
  • Frameworks: MVC, Struts, Spring, Hibernate
  • Operating Systems: Red Hat Linux, Ubuntu Linux, and Windows
  • Web Technologies :HTML, DHTML, XML
  • Web/Application servers: Apache Tomcat, WebLogic, JBoss Databases :SQL Server, MySQL
  • IDE: Eclipse, IntelliJ IDEA
  • Scheduling Tools: ESP Job Scheduler, Autosys, Windows scheduler RDBMS Utility: Toad, SQL Plus, SQL Loader

Professional Summary

• Specialist in production deployment of Hadoop cluster in cloud.

• Specialist in AWS services like S3, EC2, Virtual Private Cloud (VPC), IAM, EBS, EMR AMI and Cloud watch.

• Processing large data set and estimating cluster capacity and creating roadmap for Hadoop cluster deployment.

• Followed Cloudera’s best practices for preparing and maintaining Apache Hadoop production.

• Perform tuning at three level Cluster, OS and Yarn, cluster monitoring and troubleshooting.

• Setting up Installation, configuration, maintaining and monitoring HDFS, Yarn, Kafka, Kafka with Stream sets, Hive, Sqoop, Flume, Oozie and Spark.

• Hive production deployment in remote mode.

• Real analysis with Spark.

• Enabling High availability in production cluster for Name Node, Resource Manager, Hive Server2 and Hive Metastore.

• Experienced in deployment of Kafka and zookeeper and Kafka along with Stream sets.

• Troubleshooting, RCA and debugging Hadoop Eco system runtime issues.

• Experience in Commissioning and De-Commissioning, node balancer.

• Evaluation of Hadoop infrastructure requirements and design solutions as per business needs.

• Perform import and export of data between RDBMS to HDFS and vice versa using Sqoop.

• Experienced in BDR for production cluster, perform cluster to cluster and cluster to S3 back as per Business needs.

• Creating production ready cluster within few minutes with an Amazon EMR.

• Configured various XMLs file such as Core-site.xml, Hdfs-site.xml, Mapred-site.xml, yarn-site.xml, Hive-site.xml along with zookeeper.

• Participated in Cloudera 5 to Cloudera 6 upgrade.

• Perform Cloudera 6 and Hadoop 3 upgrade and benchmarking.

• Worked on Cloudera Director Platform for managing various clusters.

• Follow ITIL rules and best practices for delivering IT services.

• Implemented Security and Data Governance for Hadoop Cluster.

• Worked on Hadoop Security.

• Work with Business Intelligence tools like Business Objects and Data Visualization tools like Tableau.

Apply Here

Submit CV To All Data Science Job Consultants Across United States For Free

🔍 Explore All Related ITSM Jobs Below! 🚀 ✅ Select your preferred "Job Category" in the Job Category Filter 🎯 🔎 Hit "Search" to find matching jobs 🔥 ➕ Click the "+" icon that appears just before the company name to see the Job Detail & Apply Link 📝💼

LEAVE A REPLY

Please enter your comment!
Please enter your name here