The Judge Group | Hiring | Data Engineer | Houston, TX | BigDataKB.com | 2022-09-29

Job Location: Houston, TX

AWS Engineer/Big Data Developer

Responsibilities

โ€ข Improvements to our Kubernetes architecture to enable more efficient allocation of nodes

โ€ข Improve the performance and ease of use of our Spark and Airflow platforms

โ€ข Security hardening of cloud infrastructure

โ€ข Internal software systems and self-service tools to enable the creation of secure and compliant AWS resources

โ€ข Building our centralized logging and metrics collecting infrastructure

โ€ข Open sourcing some of our tools

โ€ข CI/CD for applications and data pipelines

โ€ข Strong technical leadership and communication skills

โ€ข Ability to partner cross-functionally with product, engineering, legal and compliance, and IT

โ€ข Evaluate, adopt, and advocate bleeding edge tools that solve current infra problems

โ€ข Advocate engineering best practices

โ€ข Ability to architect solutions and systems, evaluate their tradeoffs, and provide guidance on architectural decisions

โ€ข Deep expertise with AWS services including core Compute, Networking and Storage services

โ€ข Understand common application architecture components such as frontend, APIs, databases, caching, queuing, and search.

โ€ข Experience with Python

โ€ข Experience running Kubernetes in production

โ€ข Proactive in identifying problems and developing solutions

โ€ข Ability to prioritize and triage unplanned work and ensure the team stays organized

โ€ข Proficiency with Python 3, comfortable reviewing code and providing feedback on pull requests

โ€ข Understanding of Spark architecture for use in batch processing and ad-hoc analytics

โ€ข Understanding of Airflow architecture

โ€ข Kubernetes architecture: most of our infrastructure, including Airflow and Spark, is running in k8s

โ€ข AWS infrastructure including EC2, RDS, EKS, S3, VPCs, ASGs

โ€ข Build the base storage, compute, and networking infrastructure used organization wide

โ€ข Build shared infrastructure services and platform such as our Kubernetes (EKS) platform and CI/CD platform (Jenkins, GitHub Actions)

โ€ข Ensure cloud governance and security best practices are followed

โ€ข Partner with our product teams to build out highly available, scalable, and secure architectures

โ€ข Strong technical leadership and communication skills

โ€ข Ability to partner cross-functionally with product, engineering, legal and compliance, and IT

Technical Skills

  • Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala No SQL Databases: HBase, Cassandra, Mongo DB
  • Languages: Java, Python, Pyspark, UNIX shell scripts
  • Java/J2EE Technologies : Applets, Swing, JDBC, JNDI, JSON, JSTL
  • Frameworks: MVC, Struts, Spring, Hibernate
  • Operating Systems: Red Hat Linux, Ubuntu Linux, and Windows
  • Web Technologies :HTML, DHTML, XML
  • Web/Application servers: Apache Tomcat, WebLogic, JBoss Databases :SQL Server, MySQL
  • IDE: Eclipse, IntelliJ IDEA
  • Scheduling Tools: ESP Job Scheduler, Autosys, Windows scheduler RDBMS Utility: Toad, SQL Plus, SQL Loader

Professional Summary

โ€ข Specialist in production deployment of Hadoop cluster in cloud.

โ€ข Specialist in AWS services like S3, EC2, Virtual Private Cloud (VPC), IAM, EBS, EMR AMI and Cloud watch.

โ€ข Processing large data set and estimating cluster capacity and creating roadmap for Hadoop cluster deployment.

โ€ข Followed Clouderaโ€™s best practices for preparing and maintaining Apache Hadoop production.

โ€ข Perform tuning at three level Cluster, OS and Yarn, cluster monitoring and troubleshooting.

โ€ข Setting up Installation, configuration, maintaining and monitoring HDFS, Yarn, Kafka, Kafka with Stream sets, Hive, Sqoop, Flume, Oozie and Spark.

โ€ข Hive production deployment in remote mode.

โ€ข Real analysis with Spark.

โ€ข Enabling High availability in production cluster for Name Node, Resource Manager, Hive Server2 and Hive Metastore.

โ€ข Experienced in deployment of Kafka and zookeeper and Kafka along with Stream sets.

โ€ข Troubleshooting, RCA and debugging Hadoop Eco system runtime issues.

โ€ข Experience in Commissioning and De-Commissioning, node balancer.

โ€ข Evaluation of Hadoop infrastructure requirements and design solutions as per business needs.

โ€ข Perform import and export of data between RDBMS to HDFS and vice versa using Sqoop.

โ€ข Experienced in BDR for production cluster, perform cluster to cluster and cluster to S3 back as per Business needs.

โ€ข Creating production ready cluster within few minutes with an Amazon EMR.

โ€ข Configured various XMLs file such as Core-site.xml, Hdfs-site.xml, Mapred-site.xml, yarn-site.xml, Hive-site.xml along with zookeeper.

โ€ข Participated in Cloudera 5 to Cloudera 6 upgrade.

โ€ข Perform Cloudera 6 and Hadoop 3 upgrade and benchmarking.

โ€ข Worked on Cloudera Director Platform for managing various clusters.

โ€ข Follow ITIL rules and best practices for delivering IT services.

โ€ข Implemented Security and Data Governance for Hadoop Cluster.

โ€ข Worked on Hadoop Security.

โ€ข Work with Business Intelligence tools like Business Objects and Data Visualization tools like Tableau.

Apply Here

Submit CV To All Data Science Job Consultants Across United States For Free

๐Ÿ” Explore All Related ITSM Jobs Below! ๐Ÿš€ โœ… Select your preferred “Job Category” in the Job Category Filter ๐ŸŽฏ ๐Ÿ”Ž Hit “Search” to find matching jobs ๐Ÿ”ฅ โž• Click the “+” icon that appears just before the company name to see the Job Detail & Apply Link ๐Ÿ“๐Ÿ’ผ

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *