What you get to do in this role
ServiceNow’s load-balancing (ADC) team is responsible for delivering application traffic to every ServiceNow customer worldwide. We built our own load-balancing solution using well-known open-source technologies and combined it with the power of the ServiceNow platform to transform customer requirements into host configurations and deploy them at scale. We are continuously observing, measuring, testing, optimizing, and redeploying to keep our platform current, add new features, and maintain a high level of performance as we continue to grow rapidly.
- Maintain software-defined declarative infrastructure, at scale.
- Manage large-scale infrastructure with code.
- Implement new open-source and commercial tools, technologies, and methodologies.
- Collaborate with peer teams who have built world-class networking and orchestration solutions.
- Resolve operational issues in the load-balancing infrastructure of both an urgent and non-urgent nature.
- Take a lead role in the engagement and mitigation of outage-causing events or issues.
- Engage deeply in the sustainment function to proactively analyze network parameters such as capacity and availability to ensure issues are fixed before they cause an outage.
- Review, consult and prepare for planned change introduction to the production environment.
- Participate in rotating “on-call” schedule with other members of the team including weekends.
- Perform software upgrades and security patching.
- Partner with the Site Reliability Engineering (SRE) team to provide mentorship and input on operational process improvements.
- Provide feedback to infrastructure architects on design issues or improvements and input into the design process for new initiatives.
- Create and maintain technical documentation of the infrastructure and processes.
- Contribute to processes and automation to help build a low-touch, continuous deployment infrastructure where inputs at one end of the deployment pipeline are manifested in production in minutes with zero impact and a 100% success rate, allowing us to deploy or update at scale with a few clicks.