Okta Logo

Okta

Staff Site Reliability Engineer, Kubernetes Subject Matter Expert (SME)

Posted Yesterday
Be an Early Applicant
Toronto, ON
Senior level
Toronto, ON
Senior level
As a Staff Site Reliability Engineer, you will design, build, and scale Okta's Kubernetes platform while responding to incidents and enhancing security posture. You'll develop monitoring tools, maintain documentation, and support a 24x7 environment, all while promoting best practices among peers.
The summary above was generated by AI

Get to know Okta
Okta is The World’s Identity Company. We free everyone to safely use any technology—anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth. 
At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we’re looking for lifelong learners and people who can make us better with their unique experiences. 
Join our team! We’re building a world where Identity belongs to you.

Okta Workforce Identity Cloud (WIC) provides easy, secure access for your workforce so you can focus on other strategic priorities—like reducing costs, and doing more for your customers.

If you like to be challenged and have a passion for solving large-scale automation, testing, and tuning problems, we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, “If you have to do something more than once, automate it” and who can rapidly self-educate on new concepts and tools.

What you’ll be doing 

  • Designing, building, and scaling Okta's production Kubernetes platform
  • Be an evangelist for security best practices and also lead initiatives/projects to strengthen our security posture for critical infrastructure
  • Responding to production incidents and determining how we can prevent them in the future
  • Triaging and troubleshooting complex production issues to ensure reliability and performance
  • Continuously evolving our monitoring tools and platform
  • Developing and maintaining technical documentation, runbooks, and procedures
  • Supporting a 24x7 online environment as part of an on-call rotation

What you’ll bring to the role

  • Are always willing to go the extra mile: see a problem, fix the problem.
  • Are passionate about encouraging the development of engineering peers and leading by example.
  • A proven track record of successful SRE engagements and collaborating closely with engineering teams.
  • Knowledge and experience with deploying microservices and utilizing CI/CD pipelines.
  • A security mindset that prioritizes protecting assets from risks and vulnerabilities. 

Required Skills:

  • 6+ years of experience with AWS and Terraform
  • 3+ years of experience provisioning and managing Kubernetes clusters, with solid understanding of containers, Kubernetes infrastructure, and helm charts.
  • 3+ years of developer experience with Python or Golang
  • Strong Linux understanding and experience

Preferred Skills:

  • Experience with Istio service mesh and network policies
  • Familiarity with Spinnaker
  • Experience with monitoring and alerting in a Kubernetes ecosystem
  • Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) certification

#LI-Remote

#LI-LSS1

Below is the annual salary range for candidates located in Canada. Your actual salary will depend on factors such as your skills, qualifications, and experience. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental, and vision insurance, RRSP with a match, healthcare spending, telemedicine, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program, please visit: https://rewards.okta.com/can.

The annual base salary range for this position for candidates located in Canada is between:

$135,000$203,000 CAD

What you can look forward to as an Full-Time Okta employee!

  • Amazing Benefits
  • Making Social Impact
  • Fostering Diversity, Equity, Inclusion and Belonging at Okta 

Okta cultivates a dynamic work environment, providing the best tools, technology and benefits to empower our employees to work productively in a setting that best and uniquely suits their needs. Each organization is unique in the degree of flexibility and mobility in which they work so that all employees are enabled to be their most creative and successful versions of themselves, regardless of where they live. Find your place at Okta today! https://www.okta.com/company/careers/.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to participate in the job application, interview process, or onboarding please use this Form to request an accommodation.

Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Privacy Policy at https://www.okta.com/privacy-policy/. 

Top Skills

Go
Python

Okta Toronto, Ontario, CAN Office

401 Bay St., Toronto, ON, Canada, M5H 2Y4

Similar Jobs

Yesterday
Toronto, ON, CAN
Senior level
Senior level
AdTech • Software
As a Staff Site Reliability Engineer, you will oversee the optimization of performance in on-premise and hybrid cloud environments, focusing on low-latency applications running on Kubernetes. You'll lead SRE teams globally, ensuring that our transactions run efficiently and reliably 24/7, all while maintaining a strong understanding of our products and the ad tech industry.
Top Skills: Kubernetes
Yesterday
Toronto, ON, CAN
Expert/Leader
Expert/Leader
Big Data • Cloud • Software • Database
The Staff Site Reliability Engineer on the Fabric team will develop and maintain the infrastructure required for secure communication among services. Responsibilities include ensuring reliability and scalability of a multi-cloud network, collaborating with other teams, and handling technical issues to support connectivity.
Top Skills: AWSAzureBgpDnsGCPSdnTcp/IpTls
2 Days Ago
Toronto, ON, CAN
Mid level
Mid level
Social Media
As a Sr. Site Reliability Engineer at Pinterest, you will develop software solutions for reliable large-scale distributed systems, create automation tools, manage system performance and capacity, and help enhance engineering practices to maintain operational excellence.
Top Skills: JavaPythonRuby

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account