Sign In
 [New User? Sign Up]
Mobile Version

Technical Duty Officer

Walmart


Location:
Sunnyvale, CA 94086
Date:
02/01/2018
2018-02-012018-03-04
Walmart
Apply on the Company Site
  •  
  • Save Ad
  • Email Friend
  • Print
  • Research Salary

Job Details

978790BRReq ID:978790BRCompany Summary:Walmart Global eCommerce is comprised of Walmart.com, VUDU, SamsClub.com, and our technical powerhouse @WalmartLabs. Here, innovators incubate next gen e-commerce solutions in real-time. We integrate online, physical, and mobile shopping experiences for billions of customers around the globe. How do we do it? We continuously build and invest in new technology including open source tools and big data innovations. Data scientists, front and back-end engineers, product managers, and web and UX/UI teams collaborate alongside e-commerce experts to envision, prototype, and bring revolutionary ideas to life in a dynamic, flexible and fun work culture.Job Title:Technical Duty OfficerPosition Summary:As a Technical Duty Officer (TDO) within the Global Technical Engineering Operations (GTEO) CRC team you will work with other CRC, SRE, DevOps and Engineering practitioners to pro-actively maintain mission-critical infrastructure, cloud platforms, micro-services, tools, and processes that will ensure highest levels of availability and reliability of all our websites.



You're right for the job if you are comfortable leading major incident response in technical team of engineers laser focused on restoring service across complex distributed architectures. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our SRE, Engineering and DevOps teams to support our next generation always up cloud based e-commerce platform.



The Technical Duty Officer (TDO) is responsible for the availability and performance of our global sites. The TDO will take command and control of Major Incidents focusing on restoration by identifying and coordinating with appropriate resources through all the phases of triage, restoration and validation. Technically you will understand the full end to end stack and use this knowledge to detect and lead a team through incident response. Excellent judgement is crucial as you will provide final approval on site changes and hold critical switches for functionality of the site. You will ensure all documentation surrounding the Major Incidents are accurate and communication with the leadership team is clear and complete. Your ability to continuously challenge yourself and develop a strong network with peers and stakeholders cross functionally will see you exceed in this role. Our goal is to protect the customer experience and deliver outstanding levels of availability.City:SUNNYVALEState:CAPosition Description:As a Technical Duty Officer (TDO) within the Global Technical Engineering Operations (GTEO) CRC team you will work with other CRC, SRE, DevOps and Engineering practitioners to pro-actively maintain mission-critical infrastructure, cloud platforms, micro-services, tools, and processes that will ensure highest levels of availability and reliability of all our websites.



You're right for the job if you are comfortable leading major incident response in technical team of engineers laser focused on restoring service across complex distributed architectures. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our SRE, Engineering and DevOps teams to support our next generation always up cloud based e-commerce platform.



The Technical Duty Officer (TDO) is responsible for the availability and performance of our global sites. The TDO will take command and control of Major Incidents focusing on restoration by identifying and coordinating with appropriate resources through all the phases of triage, restoration and validation. Technically you will understand the full end to end stack and use this knowledge to detect and lead a team through incident response. Excellent judgement is crucial as you will provide final approval on site changes and hold critical switches for functionality of the site. You will ensure all documentation surrounding the Major Incidents are accurate and communication with the leadership team is clear and complete. Your ability to continuously challenge yourself and develop a strong network with peers and stakeholders cross functionally will see you exceed in this role. Our goal is to protect the customer experience and deliver outstanding levels of availability.Minimum Qualifications:- Control incident management processes and procedures.

- Calm under pressure when controlling major incident response.

- Excellent end to end technical understanding of core infrastructure, cloud services, platforms and micro-services.

- Ability to understand and capture key data from various sources, systems and people.

- Ability to understand traffic flows and key dependencies between services.

- Ability to effectively triage be able to detect and determine symptom vs cause.

- Act as a technical leader and coach within the CRC.

- Analyze trends to pro-actively prevent incidents.

- Focus on leading immediate restoration vs root cause.

- Develop alternative actions for incident resolution Develop procedures and documentation to support this.

- Create and maintain procedural documentation.

- Identify and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline).

- Absorb knowledge and understand complex distributed systems - ability to share and impart this knowledge into your peer group and beyond.

- Develop automation and self-healing with DevOps, Engineering and SRE partners.

- Strong focus on collecting and inferring metrics.

- Excellent communication skills.

- Ability to control multiple incidents at any given time.

- Scripting and software development to automate and help enhance existing solutions.

- Defines system architecture and tactical solutions.

- Provide data for and actively participate in root cause analysis partnering with the Problem Management function

- Demonstrate excellent judgement in decision making.

- Understanding and consideration of business strategies and priorities.

- Coach and help develop CRC engineers.



Additional responsibilities may include:



- Adhere to CRC onboarding process when accepting new systems into service.

- Share knowledge globally between CRC teams.

- Analyze systems and make recommendations to prevent possible incidents.

- Strive for continuous improvement and make recommendations based on CRC process.

- Other duties and responsibilities as assigned.



Qualifications:

- 5+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.

- Bachelor's Degree in Computer Science or a related field, or relevant work experience.

- Strong and demonstrable incident management skills with relevant experience in an enterprise organization.

- Experience and exposure working in a 24/7 operations support environment.

- Expert verbal and written communication skills.

- Methodical and systematic problem solving approach, combined with a solid awareness of ownership, initiative and drive.

- Experience investigating, analyzing and troubleshooting large scale enterprise systems.

- Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).

- Experience administering Unix/Linux in a production environment.

- Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way.

- Experience working with and developing enterprise monitoring/tooling solutions like Grafana, Kibana, Splunk, Graphite, Nagios, New Relic, Greylog and HPOM.

- Working knowledge of one or more cloud technologies such as AWS, AZURE OpenStack.Category:Software Development and Engineering Division:Global eCommerceEmployment Type:Full TimeRequisition Template:eCommerce
Apply on the Company Site
Powered ByLogo

Featured Jobs[ View All ]

Featured Employers [ View All ]