Job Location: Stamford, CT
JOB SUMMARY
Responsible for maintaining scalable, reliable, consistent and repeatable systems that support data operations for Reporting, Analytics, Applications, and Data Science by gathering and processing raw data at scale. Profiles data to measure quality, integrity, accuracy, and completeness and delivers solutions by developing, testing, and implementing code and scripts. Develop data set processes for data modeling, mining, and consumption.
MAJOR DUTIES AND RESPONSIBILITIES
Actively and consistently supports all efforts to simplify and enhance the customer experience.
Create and maintain scalable, reliable, consistent and repeatable systems that support data operations for Reporting, Analytics, Applications, and Data Science.
Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, etc.).
Use ETL/ELT processes in order to maintain, improve, clean, and manipulate data.
Profile data to measure quality, integrity, accuracy, and completeness.
Develop and implement tools, scripts, queries, and applications for ETL/ELT and data operations.
Design, build, and automate Machine Learning Data Pipeline.
Deliver solutions by developing, testing, and implementing code and scripts.
Produce reports and uphold data delivery schedules.
Manage life cycle of multiple data sources.
Work closely with stakeholders on the data demand side (analysts and data scientists).
Increase speed to delivery by implementing workload/workflow automation solutions.
Perform other duties as assigned.
REQUIRED QUALIFICATIONS
Required Skills/Abilities and Knowledge
Ability to read, write, speak and understand English
Ability to use a wide variety of open source technologies and cloud services
Extensive coding/scripting experience using Python, R, shell scripts
Extensive experience with SQL, Tableau, ML Pipeline techniques, and ETL techniques
Extensive background in Linux/Unix/CentOS installation and administration; Windows experience preferred
Extensive knowledge in data storage that demonstrates knowledge of when to use a file system, relational database, or NoSQL variant
Extensive experience with Spark, and Hadoop/Hive
Extensive familiarity with JavaScript API, Rest API or Data Extract APIs
Extensive experience receiving, converting, and cleansing big data
Extensive experience with data virtualization concepts, and software (Denodo, Teiid, Jboss)
Extensive experience with data workflow/data prep platforms, such as Infomatica, Pentaho, or Talend
Ability to identify and resolve end-to-end performance, network, server, and platform issues
Effective attention to detail with the ability to effectively prioritize and execute multiple tasks
Required Education
Bachelor’s degree in an engineering discipline or computer science
Required Related Work Experience and Number of Years
5-7 Hands-on working experience with RDBMS, SQL, scripting, and coding
3+ Linux/Unix/CentOS system admin
PREFERRED QUALIFICATIONS
Preferred Skills/Abilities and Knowledge
Expert knowledge of best practices and IT operations in an always-up, always-available service
Extensive knowledge of best practices and IT operations in an always-up, always-available service
Ability to create proof of concept experiments for analytics, machine learning, or visualization tools that include hypothesis, test plans, and outcome analysis
Extensive experience with visualization or BI tools, such as Tableau
Preferred Related Work Experience and Number of Years
Leadership experience in advanced operational analytics
WORKING CONDITIONS
Office environment
Travel depending upon project needs EGN319 319777 319777BR
Submit CV To All Data Science Job Consultants Across United States For Free

