The Judge Group | Hiring | Data Engineer | Houston, TX | BigDataKB.com

Job Location: Houston, TX

AWS Engineer/Big Data Developer

Responsibilities

• Improvements to our Kubernetes architecture to enable more efficient allocation of nodes

• Improve the performance and ease of use of our Spark and Airflow platforms

• Security hardening of cloud infrastructure

• Internal software systems and self-service tools to enable the creation of secure and compliant AWS resources

• Building our centralized logging and metrics collecting infrastructure

• Open sourcing some of our tools

• CI/CD for applications and data pipelines

• Strong technical leadership and communication skills

• Ability to partner cross-functionally with product, engineering, legal and compliance, and IT

• Evaluate, adopt, and advocate bleeding edge tools that solve current infra problems

• Advocate engineering best practices

• Ability to architect solutions and systems, evaluate their tradeoffs, and provide guidance on architectural decisions

• Deep expertise with AWS services including core Compute, Networking and Storage services

• Understand common application architecture components such as frontend, APIs, databases, caching, queuing, and search.

• Experience with Python

• Experience running Kubernetes in production

• Proactive in identifying problems and developing solutions

• Ability to prioritize and triage unplanned work and ensure the team stays organized

• Proficiency with Python 3, comfortable reviewing code and providing feedback on pull requests

• Understanding of Spark architecture for use in batch processing and ad-hoc analytics

• Understanding of Airflow architecture

• Kubernetes architecture: most of our infrastructure, including Airflow and Spark, is running in k8s

• AWS infrastructure including EC2, RDS, EKS, S3, VPCs, ASGs

• Build the base storage, compute, and networking infrastructure used organization wide

• Build shared infrastructure services and platform such as our Kubernetes (EKS) platform and CI/CD platform (Jenkins, GitHub Actions)

• Ensure cloud governance and security best practices are followed

• Partner with our product teams to build out highly available, scalable, and secure architectures

• Strong technical leadership and communication skills

• Ability to partner cross-functionally with product, engineering, legal and compliance, and IT

Technical Skills

Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala No SQL Databases: HBase, Cassandra, Mongo DB
Languages: Java, Python, Pyspark, UNIX shell scripts
Java/J2EE Technologies : Applets, Swing, JDBC, JNDI, JSON, JSTL
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: Red Hat Linux, Ubuntu Linux, and Windows
Web Technologies :HTML, DHTML, XML
Web/Application servers: Apache Tomcat, WebLogic, JBoss Databases :SQL Server, MySQL
IDE: Eclipse, IntelliJ IDEA
Scheduling Tools: ESP Job Scheduler, Autosys, Windows scheduler RDBMS Utility: Toad, SQL Plus, SQL Loader

Professional Summary

• Specialist in production deployment of Hadoop cluster in cloud.

• Specialist in AWS services like S3, EC2, Virtual Private Cloud (VPC), IAM, EBS, EMR AMI and Cloud watch.

• Processing large data set and estimating cluster capacity and creating roadmap for Hadoop cluster deployment.

• Followed Cloudera’s best practices for preparing and maintaining Apache Hadoop production.

• Perform tuning at three level Cluster, OS and Yarn, cluster monitoring and troubleshooting.

• Setting up Installation, configuration, maintaining and monitoring HDFS, Yarn, Kafka, Kafka with Stream sets, Hive, Sqoop, Flume, Oozie and Spark.

• Hive production deployment in remote mode.

• Real analysis with Spark.

• Enabling High availability in production cluster for Name Node, Resource Manager, Hive Server2 and Hive Metastore.

• Experienced in deployment of Kafka and zookeeper and Kafka along with Stream sets.

• Troubleshooting, RCA and debugging Hadoop Eco system runtime issues.

• Experience in Commissioning and De-Commissioning, node balancer.

• Evaluation of Hadoop infrastructure requirements and design solutions as per business needs.

• Perform import and export of data between RDBMS to HDFS and vice versa using Sqoop.

• Experienced in BDR for production cluster, perform cluster to cluster and cluster to S3 back as per Business needs.

• Creating production ready cluster within few minutes with an Amazon EMR.

• Configured various XMLs file such as Core-site.xml, Hdfs-site.xml, Mapred-site.xml, yarn-site.xml, Hive-site.xml along with zookeeper.

• Participated in Cloudera 5 to Cloudera 6 upgrade.

• Perform Cloudera 6 and Hadoop 3 upgrade and benchmarking.

• Worked on Cloudera Director Platform for managing various clusters.

• Follow ITIL rules and best practices for delivering IT services.

• Implemented Security and Data Governance for Hadoop Cluster.

• Worked on Hadoop Security.

• Work with Business Intelligence tools like Business Objects and Data Visualization tools like Tableau.

Apply Here

Submit CV To All Data Science Job Consultants Across United States For Free

🔍 Explore All Related ITSM Jobs Below! 🚀 ✅ Select your preferred “Job Category” in the Job Category Filter 🎯 🔎 Hit “Search” to find matching jobs 🔥 ➕ Click the “+” icon that appears just before the company name to see the Job Detail & Apply Link 📝💼

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The Judge Group | Hiring | Data Engineer | Houston, TX | BigDataKB.com | 2022-09-29

Comments

Leave a Reply Cancel reply