Site Reliability Engineer

Title: Site Reliability Engineer

Location: Remote

  • At mParticle, we are passionate about building software that empowers our customers to make the most of their data.
  • We count on our operations team and site reliability engineers (SREs) to keep our platform at peak performance and high availability, processing over 1 trillion events a month in near real-time, with no interruptions.
  • We are growing and expanding our customer deployments, and we are currently seeking an experienced SRE to join our operations team – someone who can bring fresh ideas, demonstrate a unique and informed viewpoint, who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.
  • As a Site Reliability Engineer, you will be part developer, part operations, all continuous integration and delivery expert; you will be integral to the design, set up, automation, and maintenance of our entire integration and delivery pipeline.
  • The ideal candidate should have a deep software development background married with effective intercommunication skills to promote collaboration with developers, support engineers, customers, and senior management.
  • They will work closely with development squads, our client-facing teams, and customers, as well as other engineers and developers gathering requirements, architecting, and constantly delivering quality improvements to our platform.

As an mParticle SRE, you will…

  • Be part of PagerDuty rotation responding to platform incidents and provide support for other engineers who are responding to customer issues
  • Use your daily interactions with the platform and your experience and skills to constantly improve our environment and ensure that issues do not reoccur
  • Maintain and augment our monitoring systems so that they alert on symptoms, instead of issues
  • Be proactive and take ownership in identifying, raising, and resolving issues or deficiencies you see anywhere in our environment
  • Produce and improve internal documentation and SOPs where they are missing or lacking quality or details
  • Live-debug applications and issues, and identify, resolve or own resolution for functionality and performance deficiencies
  • Identify, and suggest or resolve performance issues with production applications and their configuration
  • Automate yourself out of a job
  • Contribute to our scale goals

You will be perfect for this role, if you…

  • Have a bachelor’s degree in computer science or other highly technical, scientific discipline
  • Are able to program (structured and OO) with one or more high level languages, preferably Python and either C#, Java, or Go
  • Comfortably own the Linux shell
  • Have a proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • Have coding experience beyond simple scripts
  • Are experienced in debugging and performance tuning applications
  • Have an eye for edge cases, behaviors, creative solutions
  • Are experienced with configuration management
  • Have an unstoppable urge to fix what is broken
  • Efficiently balance speed/iteration and quality
  • Are experienced with Terraform and Ansible

As an SRE, we expect you to…

  • Fluently follow existing best practices for maintaining supported application and platform health and writing and testing code
  • Make impactful decisions about your technical contributions
  • Understand how our production systems work
  • Handle vague scope or identify improvements in small areas
  • Manage your work with little-to-no supervision
  • Actively collaborate with others through technical documentation
  • Able to troubleshoot and contribute to resolution of moderate to complex production problems, write post-mortems on them
  • Write SOPs for issues encountered and common tasks
  • Able to automate repetitive tasks using purpose-written code or commercially available tool
  • Detect inefficient common operational patterns and processes
  • Design and implement monitoring solutions for common or critical problems

As a technical resource and expert, you should be able to…

  • Handle medium complexity issues’ troubleshooting and resolution
  • Be a core resource in troubleshooting and resolving complex issues; have a deep understanding of the mParticle pipeline and be able to assist in troubleshooting medium to complex platform issues
  • Write quality, clean, and maintainable code, following company best practices with minimal guidance
  • Develop sufficient domain understanding to sanity check and ensure the quality of their output, as well as review that of other team members
  • Write custom code of medium to high complexity in at least 2 languages
  • Be the responsible/SME engineer for 2 or more internally-maintained supporting infrastructure components
  • Proactively research and keep up to date on the patterns, advancements, and evolutions of tools and technologies used in the mParticle pipeline
  • Identify problematic patterns in the mParticle applications, processes and tools and suggest and implement resolution options
  • Make small design decisions independently, making appropriate tradeoffs between simplicity and performance
  • Follow existing patterns to create new instances of projects, features, or architecture
  • Create novel architectures of small components within your area of expertise This includes diagramming the architecture and assessing trade-offs made and patterns applied, assessing the effort for the change and approximate timeline
  • Understand the flow control of nearly any system including those outside of your area of expertise, though unable to necessarily suggest improvements to systems outside of your area
  • Properly sense when to engage Security for a review of a potential change
  • Understand techniques used to troubleshoot and fix production bugs and issues
  • Develop solutions/code that reduces future operational burden (e.g. by adding appropriate self-healing, high levels of alerting/monitoring/logging, reducing alert noise, etc.)
  • Ensure that infrastructure resources are not wasted by consistently following provided best practices and rightsizing instances
  • Contribute to the build and release tooling and infrastructure

You should also be able to…

  • Be successful when working on a large feature or improvement of vague scope
  • Identify and push forward new features or enhancements that improve the functioning of a system or feature
  • Identify problems and contribute well-scoped solutions to the team’s roadmap.
  • Focus your work on what is most valuable for the team
  • Make and communicate accurate time estimates for own work, potentially spanning multiple sprints
  • Manage projects that span multiple groups of stakeholders
  • Act as an effective facilitator for team meetings
  • Consistently communicate technical decisions through high-quality design docs, tech talks, and wiki contributions
  • Create documentation and trains others, including team onboarding materials

Lastly, as part of mParticle and our Engineering organization, you should…

  • Participate, own, and improve mParticle technical recruiting, onboarding and branding
  • Act as a brand ambassador for mParticle Engineering
  • Drive the cultural direction of mParticle operations
  • Encourage people to be the best they can


Please mention that you found the job on Remote Jobs Vault as thank you to us, this helps us get more companies to post here!


Apply Now!

Any issue with this post?
Get in touch

Latest Software Development Jobs

Stay Updated!

We'll send you digest of all latest remote jobs. And you will never miss an opportunity.

Stay Updated!