Cloud Site Reliability Engineer – SRE

Cloud Site Reliability Engineer – SRE

Job Information
Author emilyetta
Date October 10, 2021
Deadline Open
Type Full Time
Company Smile CDR
Location Canada
Category Miscellaneous
Contact Information
[email protected]
About SmileCDR:
We are a Healthcare data platform that is used by developers and companies around the world to build cutting-edge medical applications. We work with app developers to build patient apps, with vendors to add modern interoperability to their platforms, and with vernments and hospitals to help them to manage their data. We also spend lots of time helping customers to build complete solutions using our platform. These solutions are used to manage health data and improve healthcare, and we are very passionate about that.
About the role:
Cloud Hosting Services is an exciting team delivering SmileCDR products as fully-managed cloud services. We work across the company, and with multiple cloud partners, to make using Smile CDR products simple for our customers. We’re a small, but rapidly growing team, making a huge impact.
As part of the Hosting Operations team, the SRE is responsible for building, operating and automating infrastructure services to deliver SaaS-based solutions on Azure/AWS. This is a full-time position

What you’ll do

  • Partner with our Security Operations teams to help define and implement best practices around Cloud Service Provider configuration for AWS, Azure and other cloud providers.
  • Develop, implement and manage a multi-tenant strategy around service offerings for DB, Container platform, Authentication, Certificates, and Product Registries etc.
  • Develop cost/ utilization tracking and attribution processes for all Cloud Service Providers
  • Create documentation around Cloud Service Provider offerings detailing use cases, best practices, and implementation details
  • Develop and maintain technical relationships with our core Cloud Service Providers
  • Design, implement and maintain a secure and scalable infrastructure platform for delivering Cloud Services applications
  • Own and ensure that internal and external SLA’s meet and exceed expectations, System centric KPIs are continuously monitored and improved
  • Create tools for automating deployment, monitoring and operations of the overall platform
  • Participate in an on-call rotation to provide application support, incident management, and troubleshooting
  • Provide ongoing maintenance and support of internal tools, improve system health and reliability
  • Proficient in Terraform, Ansible or Chef
  • Assist customers with the On-premise deployments when needed.

What we are looking for

  • Deep knowledge of cloud service providers and best practices around implementation and configuration, preferably managing Azure on behalf of multiple teams for a company that delivers SaaS products
  • Experience with Kubernetes, OpenShift, Kafka, Elastic stack.
  • You are able to prioritize and track multiple projects in parallel
  • You are highly responsive and have a customer-first mindset
  • Have experience with Security and Compliance (SOC2, HIPAA, ISO27001) best practices and how to implement controls that support high-velocity software delivery teams
  • Passionate about Infrastructure as Code, automation, and developing solutions that help developers move quickly and safely
  • Familiarity with infrastructure management and operations lifecycle concepts and ecosystem
  • Experience operating and maintaining production systems in a Linux and public cloud environment
  • You have prior experience working in high performance or distributed systems; while we strive to hire at a variety of experience levels
  • Working knowledge of industry best practices with regard to information security
  • You have built or operated a large scale Cloud service
  • Familiarity with one or more general purpose programming languages including but not limited to: Java, C/C++, C#, Python, JavaScript, PowerShell Troubleshooting skills across network, application, caching, queuing, load-balancing, storage and distributed services layers  
  • Practical experience running, testing, deploying and supporting large scale services on Azure, AWS or similar environments  
  • Ability to analyze network and performance monitor traces, application performance problems, and windows application and crash-dump debugging 
  • Ability to conceptualize a distributed service, it’s dependencies and the transactional flow when troubleshooting  
  • Experience coordinating resources across diverse teams to restore service and maintain SLA’s, ITIL certification is preferred. 
  • Communication skills are a key component of this role with audiences that include customers, peers and at times executive leadership 
  • Firm sense of accountability, ownership for end-to-end project lifecycle with solid project management and communication skills  

Smile CDR is committed to recruitment practices that are inclusive, non -discriminatory, and welcoming of persons with disabilities. Accommodations are available on request, for candidates taking part in all aspects of our selection process. If you are contacted for an interview and require accommodation during the selection process, please let us know.

Send To Friend Email Print Story

NationTalk Partners & Sponsors Learn More