NICE

Senior Site Reliability Engineer

Posted Yesterday

Be an Early Applicant

Easy Apply

Remote

Hiring Remotely in United Kingdom

Mid level

Easy Apply

Remote

Hiring Remotely in United Kingdom

Mid level

The Senior Site Reliability Engineer will manage production environments, optimize system performance, automate systems, collaborate with teams, and ensure software reliability.

The summary above was generated by AI

At NiCE, we don’t limit our challenges. We challenge our limits. Always. We’re ambitious. We’re game changers. And we play to win. We set the highest standards and execute beyond them. And if you’re like us, we can offer you the ultimate career opportunity that will light a fire within you.

So, what’s the role all about?

Run the production environment by monitoring availability and taking a holistic view of system health
Build software and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Provide primary operational support and engineering for multiple large distributed software applications

How will you make an impact?

Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objectives

Have you got what it takes?

3-6 years of working experience in a similar role, with a focus on systems engineering, automation, and reliability.
Proficiency in at least one programming language (e.g., Python, Go, Java, C#) and experience with scripting languages (e.g., Bash, PowerShell).
Deep understanding of cloud computing platforms (e.g., AWS), the working and reliability constraints of some of the prominent services (e.g., EC2, ECS, Lambda, DynamoDB etc)
Experience with infrastructure as code tools such as CloudFormation, Terraform.
Deep understanding of CI/CD concepts and experience with CI/CD tools such as Jenkins, GitLab CI/CD, or CircleCI.
Strong knowledge of containerization technologies (e.g., Docker, Kubernetes) and microservices architecture.
Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Cloudwatch).
Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
Experience of Incident management and blameless postmortems that includes driving the incident response efforts during outages and other critical incidents, resolution, and communication in a cross-functional team setup.

You will have an advantage if you also have:

Handson experience of working with large Kubernetes Cluster. Certification will be an added plus.
Working experience of Grafana Observability Suite (Loki, Mimir, Tempo).
Administration and/or development experience of standard monitoring and automation tools such as Splunk, Datadog, Pagerduty Rundeck.
Familiarity with configuration management tools like Ansible, Puppet, or Chef.
Certifications such as AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or equivalent.

Personal attributes:

Strong communication skills and the ability to collaborate effectively with cross-functional teams.
Team player - ability to work well in a close team environment.
Fast learner with ability to educate her/himself on relevant technologies
Ability to multitask and prioritize work
Ability to remain focused and calm under pressure

Enjoy NICE-FLEX!

At NICE, we work according to the NICE-FLEX hybrid model, which enables maximum flexibility: 2 days working from the office and 3 days of remote work, each week. Naturally, office days focus on face-to-face meetings, where teamwork and collaborative thinking generate innovation, new ideas, and a vibrant, interactive atmosphere.

Requisition ID:.

Reporting into: Director, Network Operations.

Role Type: Individual Contributor.

#LI-Remote

About NiCE

NICE Ltd. (NASDAQ: NICE) software products are used by 25,000+ global businesses, including 85 of the Fortune 100 corporations, to deliver extraordinary customer experiences, fight financial crime and ensure public safety. Every day, NiCE software manages more than 120 million customer interactions and monitors 3+ billion financial transactions.

Known as an innovation powerhouse that excels in AI, cloud and digital, NiCE is consistently recognized as the market leader in its domains, with over 8,500 employees across 30+ countries.

NiCE is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, age, sex, marital status, ancestry, neurotype, physical or mental disability, veteran status, gender identity, sexual orientation or any other category protected by law.

Top Skills

Ansible

AWS

Bash

Chef

CircleCI

CloudFormation

Cloudwatch

Datadog

Docker

DynamoDB

Ec2

Ecs

Elk Stack

Gitlab Ci/Cd

Grafana

Java

Jenkins

Kubernetes

Lambda

Pagerduty

Powershell

Prometheus

Puppet

Python

Rundeck

Splunk

Terraform

Similar Jobs

Circle

Senior Site Reliability Engineer

14 Days Ago

Remote

United Kingdom

Senior level

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3

As a Senior Site Reliability Engineer, you'll design and manage blockchain infrastructure, focusing on scalability and reliability, while supporting various engineering teams through automation and mentorship.

Top Skills: AWSCi/CdGCPGoKubernetesPythonShellSQLTerraform

Dataminr

Senior Site Reliability Engineer

8 Days Ago

Remote or Hybrid

Senior level

Artificial Intelligence • Big Data • Computer Vision • Machine Learning • Natural Language Processing • Software • Cybersecurity

As a Senior Site Reliability Engineer, you will enhance software delivery tools, maintain observability stacks, and support production environments while mentoring engineering teams.

Top Skills: AWSGoKubernetesLinuxPythonTerraform

Civica

Senior Site Reliability Engineer

19 Days Ago

Remote

United Kingdom

Senior level

Software

The Senior Site Reliability Engineer will enhance Civica's cloud platform reliability, design and implement scalable environments, drive automation, lead incident response, and mentor teams, ensuring security and continuous improvement in a SaaS setting.

Top Skills: .NetAksAnsibleAWSAzureDatadogEcsGithub ActionsGoGrafanaJavaKubernetesKubevirtOpenshiftPackerPrometheusPythonTerraformVMware

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.