Sign In
 [New User? Sign Up]
Mobile Version

Principal Site Reliability Engineer


Job Code:
Apply on the Company Site
  • Save Ad
  • Email Friend
  • Print
  • Research Salary

Job Details

JobId: 801041BR
JobTitle: Principal Site Reliability Engineer
State: CA
Description: - Design, write and build tools to improve the reliability, latency, availability and scalability of Walmart e-commerce products. o Engender reliability and availability starting with metrics and measurements o Enable scaling by providing tools, developing training and/or augmenting processes o Build tools/automate to prevent re-occurrence of problem to mission critical products/services. - Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure. - Participate in capacity planning, demand forecasting, software performance analysis and system tuning. - Develop a deep understanding of the various services and applications that come together to deliver Walmart e-commerce products - Design new tools to monitor and smart alerts that help discover failures/issues in a timely fashion and work with engineers to identify root cause and fix issues - Influence, design and create new architectures, standards and methods for large-scale enterprise systems. - Root-cause analysis complex problems involving multiple parties, networks, hardware and software that relate to scaling and performance - Participate in on-call rotation. - Secure the system from issues, be they real, perceived or notional - High focus on collecting and inferring metrics - Experience with configuration management tools such as Ansible, Saltstack, Chef and Puppet - Build and drive the automation systems that maintain system health - Eliminate Single Point of failure and test disaster recovery and HA regularly. #LI-JB1
Apply on the Company Site
Powered By

Featured Jobs[ View All ]

Featured Employers [ View All ]