Senior Site Reliability Engineer
Senior Site Reliability Engineer (Remote)
- 1515 Wynkoop St, Denver, CO, USA
Why Red Canary
Red Canary was founded to make security for every business better by protecting organizations around the world from cyber threats. Our combination of market-defining technology, processes, and expertise delivered using an innovative SaaS model is preventing breaches every day.
The Red Canary engineering team builds and operates the platform to deliver unmatched threat detection and response. Every engineer has unique opportunities to learn and apply new technologies, so that we can solve the hardest problems in cyber security.
Why You Matter
The Red Canary platform processes billions of raw events per day to identify threats to our customers. We are searching for a Senior Software Engineer (Site Reliability), to ensure that our platform is highly available and operating at peak performance. As one of the engineers on the Site Reliability team, you’ll be in charge of supporting our development teams and the wider organization through the design and implementation of infrastructure and automation projects.
Who You Are
You carry a DevOps mindset, with a strong interest in building creative solutions to operational problems. You’re passionate about participating in all aspects of the software development lifecycle, from architecture and implementation, to monitoring/alerting, and automation of repetitive maintenance activities. You have an understanding that writing code and ensuring its availability are not competing goals, but goals that go hand in hand.
As a Site Reliability Engineer at Red Canary, you will have the opportunity to help shape the vision and implementation that enables the highest level of availability for our systems and services. You will have the unique opportunity to influence the design of many SRE/DevOps related systems and processes including: configuration management; CI/CD pipelines; defining, monitoring, and alerting on Service Level Objectives and Indicators (SLOs/SLIs); cloud infrastructure design and implementation; and automating all things toil.
As an SRE you will:
- Participate in the team’s pager rotation, so that we can respond to operational incidents affecting our platform.
- Provide support to internal teams, so that we can communicate the impact of incidents to customers
- Use your on-call shift to identify, automate response to, and ultimately prevent recurring incidents.
- Design, implement and manage our infrastructure using SaltStack, Terraform, and Kubernetes.
- Implement monitoring and alerting around Service Level Objectives and Indicators (SLO/SLI).
- Identify manual activities, so that they can be automated.
- Improve the deployment process in such a way that Software Engineers don’t realize it exists.
- Debug production issues across the entire stack, from the Linux operating system all the way up to the application code itself.
You may be a fit for an SRE role if you:
- Think about systems from top to bottom: edge cases, failure modes, scalability and implementation specifics.
- Know your way around the Linux shell.
- Understand the purpose of Configuration and Infrastructure as Code and the role of automation.
- We use Puppet, SaltStack, and Terraform
- Have strong programming skills in one or more of the following languages:
- Ruby, Python, Go, C/C++, Rust
- Have a strong urge to collaborate both internally and across teams.
- Strongly value efficiency, with the urge to document and automate so you don’t need to learn to do the same thing twice.
- Have an enthusiastic, go-for-it attitude. If something is broken, you love to be the one to go and fix it.
- Have experience with AWS Cloud Infrastructure and Technologies:
- EC2, S3/Glacier, Route 53, RDS, DynamoDB, SQS, SNS
Projects you could work on:
- Maturing our configuration management and automation tooling as we move from Puppet to SaltStack.
- Designing, implementing and improving our Prometheus Operator stack as we mature our SLO/SLI based monitoring and alerting.
- Designing, implementing and improving our AWS and Kubernetes infrastructure, while planning/preparing for growth.
- Automating the provisioning of our Single Tenant Infrastructure as it scales, preparing and planning for growth, and ensuring maximum reliability.
- Maturing our CI/CD pipeline as we integrate CircleCI, improving automated testing/checkout capabilities, and minimizing the contact points between developing code and deploying it to production.
Working at Red Canary
You will work with an exceptionally talented team that is solving problems facing every business. Additional benefits of working at Red Canary include:
- Exceptional healthcare and dental coverage including fully paid premiums
- Flexible time off and leave programs
- 401k and flex-spending accounts
- Fitness, phone and discretionary stipends
Individuals seeking employment at Red Canary are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.
Please mention that you found the job on Remote Jobs Vault as thank you to us, this helps us get more companies to post here!