Job Location: Houston, TX
AWS Engineer/Big Data Developer
Responsibilities
• Improvements to our Kubernetes architecture to enable more efficient allocation of nodes
• Improve the performance and ease of use of our Spark and Airflow platforms
• Security hardening of cloud infrastructure
• Internal software systems and self-service tools to enable the creation of secure and compliant AWS resources
• Building our centralized logging and metrics collecting infrastructure
• Open sourcing some of our tools
• CI/CD for applications and data pipelines
• Strong technical leadership and communication skills
• Ability to partner cross-functionally with product, engineering, legal and compliance, and IT
• Evaluate, adopt, and advocate bleeding edge tools that solve current infra problems
• Advocate engineering best practices
• Ability to architect solutions and systems, evaluate their tradeoffs, and provide guidance on architectural decisions
• Deep expertise with AWS services including core Compute, Networking and Storage services
• Understand common application architecture components such as frontend, APIs, databases, caching, queuing, and search.
• Experience with Python
• Experience running Kubernetes in production
• Proactive in identifying problems and developing solutions
• Ability to prioritize and triage unplanned work and ensure the team stays organized
• Proficiency with Python 3, comfortable reviewing code and providing feedback on pull requests
• Understanding of Spark architecture for use in batch processing and ad-hoc analytics
• Understanding of Airflow architecture
• Kubernetes architecture: most of our infrastructure, including Airflow and Spark, is running in k8s
• AWS infrastructure including EC2, RDS, EKS, S3, VPCs, ASGs
• Build the base storage, compute, and networking infrastructure used organization wide
• Build shared infrastructure services and platform such as our Kubernetes (EKS) platform and CI/CD platform (Jenkins, GitHub Actions)
• Ensure cloud governance and security best practices are followed
• Partner with our product teams to build out highly available, scalable, and secure architectures
• Strong technical leadership and communication skills
• Ability to partner cross-functionally with product, engineering, legal and compliance, and IT
Technical Skills
- Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala No SQL Databases: HBase, Cassandra, Mongo DB
- Languages: Java, Python, Pyspark, UNIX shell scripts
- Java/J2EE Technologies : Applets, Swing, JDBC, JNDI, JSON, JSTL
- Frameworks: MVC, Struts, Spring, Hibernate
- Operating Systems: Red Hat Linux, Ubuntu Linux, and Windows
- Web Technologies :HTML, DHTML, XML
- Web/Application servers: Apache Tomcat, WebLogic, JBoss Databases :SQL Server, MySQL
- IDE: Eclipse, IntelliJ IDEA
- Scheduling Tools: ESP Job Scheduler, Autosys, Windows scheduler RDBMS Utility: Toad, SQL Plus, SQL Loader
Professional Summary
• Specialist in production deployment of Hadoop cluster in cloud.
• Specialist in AWS services like S3, EC2, Virtual Private Cloud (VPC), IAM, EBS, EMR AMI and Cloud watch.
• Processing large data set and estimating cluster capacity and creating roadmap for Hadoop cluster deployment.
• Followed Cloudera’s best practices for preparing and maintaining Apache Hadoop production.
• Perform tuning at three level Cluster, OS and Yarn, cluster monitoring and troubleshooting.
• Setting up Installation, configuration, maintaining and monitoring HDFS, Yarn, Kafka, Kafka with Stream sets, Hive, Sqoop, Flume, Oozie and Spark.
• Hive production deployment in remote mode.
• Real analysis with Spark.
• Enabling High availability in production cluster for Name Node, Resource Manager, Hive Server2 and Hive Metastore.
• Experienced in deployment of Kafka and zookeeper and Kafka along with Stream sets.
• Troubleshooting, RCA and debugging Hadoop Eco system runtime issues.
• Experience in Commissioning and De-Commissioning, node balancer.
• Evaluation of Hadoop infrastructure requirements and design solutions as per business needs.
• Perform import and export of data between RDBMS to HDFS and vice versa using Sqoop.
• Experienced in BDR for production cluster, perform cluster to cluster and cluster to S3 back as per Business needs.
• Creating production ready cluster within few minutes with an Amazon EMR.
• Configured various XMLs file such as Core-site.xml, Hdfs-site.xml, Mapred-site.xml, yarn-site.xml, Hive-site.xml along with zookeeper.
• Participated in Cloudera 5 to Cloudera 6 upgrade.
• Perform Cloudera 6 and Hadoop 3 upgrade and benchmarking.
• Worked on Cloudera Director Platform for managing various clusters.
• Follow ITIL rules and best practices for delivering IT services.
• Implemented Security and Data Governance for Hadoop Cluster.
• Worked on Hadoop Security.
• Work with Business Intelligence tools like Business Objects and Data Visualization tools like Tableau.
Submit CV To All Data Science Job Consultants Across United States For Free