Zafin Logo

Zafin

Cloud Site Reliability Engineer I

Sorry, this job was removed at 02:11 a.m. (EST) on Friday, Feb 21, 2025
Be an Early Applicant
Toronto, ON
Toronto, ON

Who we are

Founded in 2002, Zafin offers a SaaS product and pricing platform that simplifies core modernization for top banks worldwide. Our platform enables business users to work collaboratively to design and manage pricing, products, and packages, while technologists streamline core banking systems. 

With Zafin, banks accelerate time to market for new products and offers while lowering the cost of change and achieving tangible business and risk outcomes. The Zafin platform increases business agility while enabling personalized pricing and dynamic responses to evolving customer and market needs. 

Zafin is headquartered in Vancouver, Canada, with offices and customers around the globe including ING, CIBC, HSBC, Wells Fargo, PNC, and ANZ. Zafin is proud to be recognized as a top employer and certified Great Place to Work® in Canada, India and the UK.  

What is the opportunity? 

Zafin, a global leader in financial technology solutions, is seeking a Cloud Site Reliability Engineer I (CSRE I) to join our dynamic team. Reporting directly to the VP of Cloud Services, this role is pivotal in ensuring the seamless operation, support, and maintenance of Zafin's cloud infrastructure and applications. As a CSRE I, you will leverage your expertise to enhance system reliability, scalability, and performance, collaborating with cross-functional teams to ensure exceptional service delivery to clients and stakeholders.

The ideal candidate will have a strong foundation in cloud platforms, incident management, and proactive operational practices, with a continuous improvement mindset to adapt to advancing technologies.

Mode of work: Hybrid

What will you do?

  • Act as a level-3 technical support expert for Zafin products and Azure cloud issues.
  • Collaborate with Product, Platform Engineering, and DevOps teams to introduce operational enhancements and resiliency measures.
  • Conduct Root Cause Analysis (RCA) for Severity 1 and 2 incidents, ensuring timely communication with stakeholders.
  • Participate in external client escalation calls, providing technical insights and solutions.
  • Optimize cloud infrastructure for scalability, performance, and cost-effectiveness.
  • Manage container orchestration platforms such as Azure Kubernetes Service (AKS) or OpenShift to ensure optimal workload distribution.
  • Enhance monitoring and tracking tools (e.g., Azure Monitor, ELK, Log Analytics) to proactively detect and resolve issues.
  • Collaborate with internal teams to implement best practices for Azure cloud deployment and configuration.
  • Develop automation scripts for routine operational tasks, incident responses, and cloud cost optimization.
  • Maintain detailed documentation of processes, incidents, and cloud architecture.
  • Participate in a rotating on-call schedule to ensure 24/7 availability for critical incidents.

What do you need to succeed?

Must Haves:

  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • 8+ years of experience in cloud support, operations, or a related role.
  • Hands-on experience with Microsoft Azure (preferred) or other cloud platforms.
  • Proficiency in container orchestration platforms like AKS or OpenShift.
  • Expertise in automated deployment pipelines, particularly Azure DevOps.
  • Familiarity with enterprise monitoring platforms such as Azure Insights, Grafana, or Site24/7.
  • Proficiency in scripting languages like PowerShell or Python.
  • Proven experience in incident management and maintaining SLAs for critical production environments.
  • Knowledge of Postgres databases.

Preferred Qualifications:

  • Certifications in cloud platforms (e.g., Microsoft Azure Administrator).
  • Familiarity with ITSM tools (e.g., Zendesk, ServiceNow).
  • Knowledge of compliance and security best practices in cloud environments.

 

What’s in it for you

Joining our team means being part of a culture that values diversity, teamwork, and high-quality work. We offer competitive salaries, annual bonus potential, generous paid time off, paid volunteering days, wellness benefits, and robust opportunities for professional growth and career advancement. Want to learn more about what you can look forward to during your career with us? Visit our careers site and our openings: zafin.com/careers

Zafin welcomes and encourages applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the selection process. 

Zafin is committed to protecting the privacy and security of the personal information collected from all applicants throughout the recruitment process. The methods by which Zafin contains uses, stores, handles, retains, or discloses applicant information can be accessed by reviewing Zafin’s privacy policy at https://zafin.com/privacy-notice/. By submitting a job application, you confirm that you agree to the processing of your personal data by Zafin described in the candidate privacy notice.

Zafin Toronto, Ontario, CAN Office

123 Front St W, Toronto, Ontario, Canada, M5J 2M2

Similar Jobs

2 Days Ago
Toronto, ON, CAN
Expert/Leader
Expert/Leader
Fintech • Payments • Software
As a Cloud Site Reliability Engineer II, you will lead strategic initiatives to ensure the reliability and performance of cloud infrastructure, manage complex technical issues, and mentor junior engineers. The role involves architectural design, incident management, and collaboration with cross-functional teams to optimize operational strategies.
Top Skills: PowershellPython
2 Days Ago
Ottawa, ON, CAN
Expert/Leader
Expert/Leader
Fintech • Payments • Software
The Cloud Site Reliability Engineer II will lead initiatives to ensure cloud infrastructure reliability and performance, manage technical issues, mentor junior engineers, and drive strategic operational improvements in Zafin's Azure environment.
Top Skills: AzurePowershellPython
17 Hours Ago
Toronto, ON, CAN
Junior
Junior
Artificial Intelligence • Healthtech • Insurance • Software
As a Site Reliability Engineer at Wisedocs AI, you will design, implement, and maintain cloud infrastructure while ensuring optimal performance and reliability. Your role involves automation, troubleshooting incidents, optimizing systems, and collaborating with development teams to integrate SRE practices.
Top Skills: BashPython

What you need to know about the Toronto Tech Scene

Although home to some of the biggest names in tech, including Google, Microsoft and Amazon, Toronto has established itself as one of the largest startup ecosystems in the world. And with over 2,000 startups — more than 30 percent of the country's total startups — Toronto continues to attract new businesses. Be it helping entrepreneurs manage their finances, simplifying business operations by automating payroll or assisting pharmaceutical companies in launching new drugs, the city's tech scene is just getting started.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account