Job Title: (Site Reliability/Observability Engineer (SREs).)
Work Location: Johnston RI
9 Years is a must.
Job Description:
Objectives of this role:
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Build software and systems to manage platform infrastructure and applications.
- Improve reliability quality and timetomarket of our suite of software solutions.
- Measure and optimize system performance with an eye toward pushing our capabilities forward getting ahead of customer needs and innovating for continual improvement.
- Provide primary operational support and engineering for multiple largescale distributed software applications.
Responsibilities:
- Gather and analyse metrics from operating systems as well as applications to assist in performance tuning and fault finding.
- Partner with development team Data Scientist MLOps Architect/Engineers to improve services through rigorous testing and release procedures.
- Participate in system design consulting platform management Troubleshooting production issues and capacity planning.
- Create/manage sustainable systems and services through automation and uplifts.
- Balance feature development speed and reliability with welldefined servicelevel objectives
Required skills and qualifications:
- Ability to program (structured and OOP) using one or more highlevel languages such as Python Java C/C Ruby and JavaScript
- Experience in working with such as Amazon S3 Sagemaker Amazon Bedrock
- Excellent knowledge working with cloudnative infrastructure such as AWS Lambda OpenShift
- Good understanding of API management and should be able to troubleshoot API related issues.
- Automation Mindset to manage cloud infrastructure using AWS CloudFormation/Terraform
- Impeccable creative and communication skills.
- Ability to problem solve in a fastpaced highstakes environment.
- Proactive approach to identifying problems performance bottlenecks and areas for improvement.