Job Location: Bangalore/Bengaluru
In this role, candidate will be working closely with Data Engineers, Performance/Monitoring Engineers, and Infrastructure/Application teams towards delivering AIOps at scale solutions. This includes effort to strengthen foundational elements of AIOps such as monitoring, data pipeline, event correlation and automation.
Ideally, candidate need to have strong understanding in IT Operations domain, exposures in Global Operations Center (i.e. NOC), and experience with problem solving involving Major IT outages, along with expertise in implementing enterprise-level proactive monitoring and automated solutions to eliminate IT outages.
The goal of AIOps team is to provide a holistic view of organization’s entire applications and IT environment by pulling together data across siloed IT stacks and tools so we can prevent and rapidly resolve complex IT issues.
- Leverage AIOps capabilities to build solutions and prevent IT outages at scale
- Lead collaboration with developers, architects and teams across IT organization to design new and modify existing components of monitoring/AIOps platforms.
- Design and develop enterprise-scale AI Ops solutions through modern monitoring tools & AIOps platforms
- Perform functional analysis and provide work estimates for proposed changes/implementations
- Learn and apply new strategies and industry practices related to AIOps
- Work within an agile development team in a dynamic fashion having the ability to upskill when needed.
- Foster a high performing environment that enables teams to improve over time doing things correctly, quickly and consistently
Desired Candidate Profile
- Bachelor’s/Master’s degree in Computer Science or equivalent with score 6 GPA and above
8-10 years of experience in IT Operations - Solid foundation of automation and orchestration across systems, processes and workflows (e.g. openshift container, Ansible, Puppet, Chef, orchestration platform)
- Experienced in modern monitoring tools in IT Ops domain (ITIM, NPMD, APM)
- Experienced in IT Event Management tools (e.g. ServiceNow ITOM, Moogsoft, BMC TrueSight, Microfocus)
- Proficient in tools for logging, alerting and monitoring onprem/cloud infrastructure, services and performance.
- Understanding in big data, algorithms, and machine learning (e.g. anomaly detection, predictive analysis, automated RCA)
- Strong knowledge in version control system (e.g. GitLab, GitHub)
- Unix/Linux knowledge is required. Experienced in Core Java, Java scripts, Python, Shell scripting
- Good understanding of SDLC including analysis, design, implementation, testing, monitoring
- Intermediate database knowledge. Understanding concept of RDBMS i.e. nested queries, records maintenance.
- Understand APIs and other modern-stack application and Cloud technologies
- Knowledge of IT Operations space and Site Reliability Engineer job role is an added advantage
- Excellent written and verbal communication skills
- Agile methodologies and project execution.
Submit CV To All Data Science Job Consultants Across India For Free

