Overview

GitHub is looking for observability-minded SREs to join our Site Reliability organization. At GitHub, “observability” means more than collecting timeseries and graphing dashboards. We are looking for software engineers to help us build insight into the behavior and performance of the world’s largest code hosting platform.

You will join an existing team of SREs and leverage open source and third-party solutions to observability-related topics such as debuggability, golden metrics, event correlation, and of course, timeseries collection and graphing. Your work will bring coherent theory and practice to how we build and scale GitHub’s products.

You will work with open source communities and represent our scalability and reliability needs. Success in this role requires you to actively contribute to these upstream projects and to our industry at large as you improve the functionality and performance of our systems.

The SRE team is highly distributed and you will thrive in an environment of remote work and asynchronous communication. You’re expected to have strong written communication skills and be able to develop working relationships with coworkers in locations around the globe.

This role provides an opportunity to blend your system design and software engineering skills on an ever-changing set of novel scalability and reliability challenges. This is an opportunity to think deeply about observability and make a real impact on how the world builds software.

Responsibilities

  • Build reliable, available, and sustainable observability into our products, infrastructure, and organization.
  • Balance your observability and community mandates with the existing service portfolio and responsibilities of the SRE team.
  • Contribute positively to industry and/or open source communities on observability topics.
  • Identify and integrate with third-party solutions where it makes the most sense.
  • Use data to understand the availability, reliability, and sustainability of our software.
  • Bring experience, pragmatism, empathy, and composure to interactions with teams outside of the SRE organization.
  • Focus on impact, not activity.

Minimum Qualifications

  • A passion for observability followed closely by a passion for site reliability engineering.
  • Proficiency with performant languages such as Go, C/C++.
  • Success as an open-source contributor or strong desire for open-source work.
  • Experience working on observability projects and themes.
  • Familiarity with open-source and third-party observability tools.
  • Deep comfort with the GNU/Linux operating system.
  • A mix of system design and software engineering skills and the ability blend those perspectives pragmatically based on project needs.
  • Proficiency in high-level languages such as Ruby, Python, and Bash.
  • Incident response and management experience.

Preferred Qualifications

  • Experience diagnosing and troubleshooting complex distributed systems with availability and scalability constraints.
  • Practical experience with Prometheus.
  • Familiarity with Istio.
  • Experience with Kubernetes and Docker.
  • Experience building infrastructure automation.
  • Experience negotiating SLIs, SLOs, and SLAs with product owners.
  • Success in a remote work environment.
  • Familiarity with configuration management software such as Puppet, Chef, Ansible, or Sa

Who We Are:

GitHub is the best place to share code with friends, co-workers, classmates, and complete strangers. Over fifteen million people use GitHub to build amazing things together. With the collaborative features of GitHub.com and GitHub Business, it has never been easier for individuals and teams to write faster, better code.

What We Value:

Collaboration: We believe the best work is done together. Empathy: We believe in putting people first. Quality: We believe in setting the standard for excellence. Positive Impact: We believe in making the world a better place through our work. Shipping: We believe in creating things for the people using them.

Why You Should Join:

At GitHub, we constantly strive to create an environment that allows our employees (Hubbers) to do the best work of their lives. We’ve designed one of the coolest workspaces in San Francisco (HQ), where over half of our Hubbers work, snack, and create daily. The other half of our Hubbers work remotely in 18 countries across the globe.

We are also committed to keeping Hubbers healthy, motivated, focused and creative. We’ve designed our top-notch benefits program with these goals in mind. In a nutshell, we’ve built a place where we truly love working, we think you will too.

GitHub is made up of people from a wide variety of backgrounds and lifestyles. We embrace diversity and invite applications from people of all walks of life. We don’t discriminate against employees or applicants based on gender identity or expression, sexual orientation, race, religion, age, national origin, citizenship, disability, pregnancy status, veteran status, or any other differences. Also, if you have a disability, please let us know if there’s any way we can make the interview process better for you; we’re happy to accommodate!