Our Cloud Center of Excellence is looking for a Technical Product Manager to join a site reliability team.
As a TPM, you will own the "Reliability SLO" of a technology Platform and be responsible for coordinating the work of a reliability team and for the engagement with development teams. Our definition of Reliability is an aggregation of the four golden signals (latency/error rate/updatime/cost) as well as security.
Qualified candidates will possess an engineering background with at least some experience in DevSecOps functions and/or AWS Cloud engineering activities. They will also have had hands-on experience gathering business requirements, analyzing workflows/processes, prioritizing teams backlogs, writing functional specifications in user story form, assessing risk, and reporting on key business indicators. Strong communication and organizational/task tracking skills are key.
Responsibilities: Organization Enablement:
Perform Team Health Checks with recurring feedback
Define communication strategy and execution across portfolios
Perform change impact analysis
Enable Service adoption and sustainability measurements
Governance:
Perform Financial Reporting & Analysis of hosting charges
Oversee operational reporting of events, incidents, issues, SLA
Establish and report on business insights / KPIs
Develop and execute strategy for industry certification compliance (SOC-2/NIST / 1EdTech) across the various products inside the platform
Business Product Management:
Establish demand management and improve business agility
Perform functional decomposition on complex problems
Prioritize work activities through a combination of stakeholder input, business value, and cost to achieve
Curate a roadmap by establishing a technical vision in collaboration with stakeholders
Program Management:
Refine and advocate for agile delivery management through the role of scrum master by leading ceremonies to maximize team productivity, help resolve blockers and dependencies, and enable sizing of work and task breakdown
Establish charters through the identification of business product opportunities and collaborate with software developers to assess the feasibility of software solutions
Collaborate with software developers, TPMs and business product managers to establish development, testing and deployment plans
Draft agile themes, epics and stories, maintain backlog with high-quality stories, acceptance criteria, and clear priorities.
Manage the schedule and identify, communicate and resolve blockers to the schedule with clear delivery timelines and scope being well understood by team members, stakeholders, and dependent teams.
Perform a risk assessment throughout product development
Manage defect / security issue triage in partnership with product managers, business partners and customer support teams.
Resiliency Engineering
Collaborate with dev teams to identify failure points and blast radius of systems
Validate effectiveness of monitoring and observability configurations
Coordinate failure injection testing
Observe and document steady state production levels, growth patterns
Plan and forecast for seasonal growth, communicate trend lines with leadership, enhance infrastructure scaling plans to accommodate 2x planned load
Coordinate improvements of existing software and infrastructure to meet resiliency goals
Cloud Engineering
Hands-on design, analysis, development and troubleshooting of highly-distributed large-scale production systems and event-driven, cloud-based services
Ensure repeatability, traceability, and transparency of our infrastructure automation (infrastructure-as-code, monitoring-as-code)
Participate in continual learning of the AWS ecosystem, game day scenarios, and professional conferences
Collaborative solutioning of enterprise applications with development teams utilizing our software stack
Actively monitor AWS Cost, and utilize optimizer to maximize ROI while maintaining Service Level Objectives
#J-18808-Ljbffr