Job Title: Production/Incident & Application Support Manager Location: Thane Reports to: Sr VP Delivery head Department: Engineering ; Full-Time
About us: At Netcore, innovation isn�t just a buzzword�it's the core of everything we do. As the pioneering force behind the first and leading AI/ML-powered Customer Engagement and Experience Platform (CEE), we're dedicated to revolutionizing how B2C brands interact with their customers. Our state-of-the-art SaaS products are designed to foster personalized engagement throughout the entire customer journey, creating remarkable digital experiences for businesses of all sizes. Engineering at Netcore: Dive into a world where your work directly impacts engagement, conversions, revenue, and customer retention. Our engineering team tackles complex challenges that come with scaling high-performance systems. We thrive on versatility and speed, employing advanced tech stacks such as Kafka, Storm, RabbitMQ, Celery, RedisQ, and GoLang, all hosted robustly on AWS and GCP clouds. At Netcore, you're not just solving technical problems�you're setting industry benchmarks.
Job Summary: We are seeking a seasoned leader for our SRE & Application Support division, overseeing the reliability, scalability, and efficient operation of our martech tools built on open-source frameworks. This role will play a key part in maintaining the operational stability of our products on Netcore Cloud's infrastructure, ensuring 24/7 availability, and driving incident management. The ideal candidate will combine strong leadership abilities with a deep understanding of site reliability, automation, performance monitoring, and application support, delivering world-class service to our clients and partners.
Key Responsibilities: SRE Leadership & Strategy: - Lead the Site Reliability Engineering (SRE) team to design and implement robust systems ensuring uptime, scalability, and security. - Develop and maintain strategies for high availability, disaster recovery, and capacity planning of all Martech tools. - Advocate and apply the principles of automation to eliminate repetitive tasks and improve efficiency. - Establish and refine Service Level Objectives (SLOs), and Service Level Agreements (SLAs) in collaboration with product and engineering teams. Application Support: - Oversee and lead the Application Support Team responsible for maintaining the health and performance of customer-facing applications built on the NetcoreCloud platform. - Develop processes and Debugging procedures to ensure quick resolution of technical issues, and serve as an escalation point for critical incidents. - Ensure all incidents are triaged and handled efficiently, with proper root cause analysis and follow-up post-mortems for critical incidents. - Manage the implementation of monitoring tools and log management systems to detect, alert, and respond to potential issues proactively. Collaboration and Cross-Functional Leadership: - Work closely with Sales, CSM, Customer Support, development, QA, and DevOps teams. - Collaborate with stakeholders to drive a culture of continuous improvement by identifying and eliminating potential risks and issues in the system. - Be involved in PI (Program Increment) planning to align with product roadmaps, making sure reliability is factored into new feature development. Team Management & Development: - Recruit, mentor, and manage the SRE and Application Support Team, fostering a high-performance and collaborative environment. - Conduct regular performance reviews, provide feedback, and support professional development within the team. Innovation and Open-Source Contribution: - Lead initiatives to improve the open-source frameworks utilized in the martech stack, contributing to the open-source community as needed. - Stay current with emerging technologies, tools, and best practices in site reliability, automation, and application support.
Requirements: Experience: - 8+ years of experience in SRE, DevOps, or Application Support roles, with at least 3 years in a leadership position. - Proven track record of managing systems on open-source frameworks and cloud platforms such as NetcoreCloud or similar. - Demonstrated expertise in incident management, post-mortem analysis, and improving mean time to recovery (MTTR). - Strong experience in monitoring tools (Prometheus, Grafana, or similar), logging frameworks, and automation tools (Terraform, Ansible). Technical Skills: - Hands-on experience with Linux/Unix environments, cloud services (AWS, GCP, NetcoreCloud). - Proficiency in scripting and coding (Python, Php, Golang, Java, or similar languages) for automation purposes. - Solid understanding of CI/CD pipelines, version control (Git), and Alert & Application monitoring tools. Leadership & Soft Skills: - Proven leadership skills, with experience in team building, mentorship, and fostering a culture of accountability. - Strong interpersonal and communication skills, with the ability to interface effectively with technical and non-technical stakeholders. - Ability to manage multiple projects simultaneously, prioritize tasks, and work under pressure to meet deadlines.
Preferred Qualifications: - Experience in the martech, Digital Marketing domain or working with large-scale, customer-facing SaaS applications. - Certification in SRE, DevOps, or cloud platforms (AWS, GCP). - Good application debugging skills, Product feature understanding skills.
Why Join Us? - Be a part of an innovative and forward-thinking organization that values technology and continuous improvement. - Work with cutting-edge open-source frameworks and cloud technologies., SAAS Product. - Leadership opportunities with a direct impact on our customers and product success.
Let's start a conversation and make magic happen together! Website - https://netcorecloud.com/ |