NYC / NJ location (Metropark Iselin NJ) 3 days a week work from office
Duration 1 year
Job Description:
10 years of Software Engineering and Architecture experience with at least 5 years on SRE focused experience in Production Support Application Support and DevOps implementation.
- Demonstrated experience enabling SRE principles and practices with technical and operations teams in different SRE maturity levels in Engineering and Operations space.
- Demonstrated experience influencing design committee and process teams to establish standards by improving the approaches and maturity across IT teams.
- Work closely with Infrastructure services and product teams to develop reliable solutions to improve availability scalability and performance targets.
- Experience in SDLC life cycle from architecture and software designs SLA/SLO definitions tech debts reviews CI/CD releases monitoring KPIs to DevOps principles.
- Experience in production systems analyzing performance and error metrics lead triage and troubleshooting exercises and track incident management targets (MTTx)
- Strong experience in infrastructure and Applications technology components and designs assess problem areas (logs/events) support in analysis (metrics/traces) and recommend solutions.
- Handson experience coding and developing automation solutions leveraging APIs based integrations configuration using Ansible and Terraform for IAAS solutions.
- Experience working in microservices and containerized platforms to support platforms through monitoring alerting and troubleshooting needs part of service operations.
- Technical knowledge and experience in cloud architectures hybrid cloud and cloud native solutions to leverage reliable designs in cloud to improve operational efficiencies.
- Experience working in Incident management leveraging postmortem analysis and developing reliable solutions part of driving multiple incident management initiatives.
- Experience in Observability tools and frameworks concepts of golden signals MELT data integration and Analysis using market solutions to improve operational efficiencies.
- Experience managing and growing teams to achieve shortterm and longterm goals part of the SRE RoadMap and align with SRE strategic goals.
- Experience handling partnership with multiple peers stakeholders and able to interact with leadership team and technical teams at different levels.
- Ability to adapt support multiple application and infrastructure groups towards SRE needs in a fastpaced dynamic and growing organization.
Must Have
10 years of overall IT experience focusing on Software Engineering Architecture and/or supporting Production technologies. - 57 years of Monitoring analysis experience using ANY Observability solutions like Splunk Dynatrace New Relic Grafana and Datadog etc.
- 5 years of development/coding experience developing engineering solutions for a largescale missioncritical applications.
- 5 years of handson experience as SRE lead or individual contributor delivering on SRE goals and objectives across IT groups.
- 5 years of experience working in Kubernetes platforms public cloud AWS GCP Azure to support in implementation or operational needs.