Job Location: Houston, TX
AWS Engineer/Big Data Developer
Responsibilities
โข Improvements to our Kubernetes architecture to enable more efficient allocation of nodes
โข Improve the performance and ease of use of our Spark and Airflow platforms
โข Security hardening of cloud infrastructure
โข Internal software systems and self-service tools to enable the creation of secure and compliant AWS resources
โข Building our centralized logging and metrics collecting infrastructure
โข Open sourcing some of our tools
โข CI/CD for applications and data pipelines
โข Strong technical leadership and communication skills
โข Ability to partner cross-functionally with product, engineering, legal and compliance, and IT
โข Evaluate, adopt, and advocate bleeding edge tools that solve current infra problems
โข Advocate engineering best practices
โข Ability to architect solutions and systems, evaluate their tradeoffs, and provide guidance on architectural decisions
โข Deep expertise with AWS services including core Compute, Networking and Storage services
โข Understand common application architecture components such as frontend, APIs, databases, caching, queuing, and search.
โข Experience with Python
โข Experience running Kubernetes in production
โข Proactive in identifying problems and developing solutions
โข Ability to prioritize and triage unplanned work and ensure the team stays organized
โข Proficiency with Python 3, comfortable reviewing code and providing feedback on pull requests
โข Understanding of Spark architecture for use in batch processing and ad-hoc analytics
โข Understanding of Airflow architecture
โข Kubernetes architecture: most of our infrastructure, including Airflow and Spark, is running in k8s
โข AWS infrastructure including EC2, RDS, EKS, S3, VPCs, ASGs
โข Build the base storage, compute, and networking infrastructure used organization wide
โข Build shared infrastructure services and platform such as our Kubernetes (EKS) platform and CI/CD platform (Jenkins, GitHub Actions)
โข Ensure cloud governance and security best practices are followed
โข Partner with our product teams to build out highly available, scalable, and secure architectures
โข Strong technical leadership and communication skills
โข Ability to partner cross-functionally with product, engineering, legal and compliance, and IT
Technical Skills
- Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala No SQL Databases: HBase, Cassandra, Mongo DB
- Languages: Java, Python, Pyspark, UNIX shell scripts
- Java/J2EE Technologies : Applets, Swing, JDBC, JNDI, JSON, JSTL
- Frameworks: MVC, Struts, Spring, Hibernate
- Operating Systems: Red Hat Linux, Ubuntu Linux, and Windows
- Web Technologies :HTML, DHTML, XML
- Web/Application servers: Apache Tomcat, WebLogic, JBoss Databases :SQL Server, MySQL
- IDE: Eclipse, IntelliJ IDEA
- Scheduling Tools: ESP Job Scheduler, Autosys, Windows scheduler RDBMS Utility: Toad, SQL Plus, SQL Loader
Professional Summary
โข Specialist in production deployment of Hadoop cluster in cloud.
โข Specialist in AWS services like S3, EC2, Virtual Private Cloud (VPC), IAM, EBS, EMR AMI and Cloud watch.
โข Processing large data set and estimating cluster capacity and creating roadmap for Hadoop cluster deployment.
โข Followed Clouderaโs best practices for preparing and maintaining Apache Hadoop production.
โข Perform tuning at three level Cluster, OS and Yarn, cluster monitoring and troubleshooting.
โข Setting up Installation, configuration, maintaining and monitoring HDFS, Yarn, Kafka, Kafka with Stream sets, Hive, Sqoop, Flume, Oozie and Spark.
โข Hive production deployment in remote mode.
โข Real analysis with Spark.
โข Enabling High availability in production cluster for Name Node, Resource Manager, Hive Server2 and Hive Metastore.
โข Experienced in deployment of Kafka and zookeeper and Kafka along with Stream sets.
โข Troubleshooting, RCA and debugging Hadoop Eco system runtime issues.
โข Experience in Commissioning and De-Commissioning, node balancer.
โข Evaluation of Hadoop infrastructure requirements and design solutions as per business needs.
โข Perform import and export of data between RDBMS to HDFS and vice versa using Sqoop.
โข Experienced in BDR for production cluster, perform cluster to cluster and cluster to S3 back as per Business needs.
โข Creating production ready cluster within few minutes with an Amazon EMR.
โข Configured various XMLs file such as Core-site.xml, Hdfs-site.xml, Mapred-site.xml, yarn-site.xml, Hive-site.xml along with zookeeper.
โข Participated in Cloudera 5 to Cloudera 6 upgrade.
โข Perform Cloudera 6 and Hadoop 3 upgrade and benchmarking.
โข Worked on Cloudera Director Platform for managing various clusters.
โข Follow ITIL rules and best practices for delivering IT services.
โข Implemented Security and Data Governance for Hadoop Cluster.
โข Worked on Hadoop Security.
โข Work with Business Intelligence tools like Business Objects and Data Visualization tools like Tableau.
Submit CV To All Data Science Job Consultants Across United States For Free

Leave a Reply