Overview

Production Engineering at Shopify encompasses the disciplines of site reliability engineering, infrastructure engineering, and developer productivity. Our team ensures that Shopify infrastructure is able to scale massively, while also delivering resilient systems, amazing performance, and impactful tools for our entire engineering team.

The objective of the Service Pattern group is to spread to the rest of the organization the tooling, lessons and patterns we’ve used to reliably scale Shopify to over 80,000 requests per second on Rails. Today, new applications are spinning up around the company. On day one, they reach many similar scaling challenges. In this group, we extract and evolve the tools that have allowed Shopify to scale and provide them to every developer in the company. We want Shopify developers to focus on making commerce better, not being concerned with infrastructure. Providing scalability is our job.

You’ll be responsible for designing tools that help build scalable, maintainable and resilient applications. These tools will be consumed by hundreds of developers and applications across the organization, and will allow them to abstract away the pain-points of scale. You will have the ability to continually ship changes to production multiple times a day, affecting developers and merchants across the entire platform. Developer productivity is one of the key success criteria for us, so it’s important that you write high quality documentation as well as good code.

Some of the challenges our group works on:

  • Evolving our sharding abstraction to allow other applications around the organization to take advantage of the architecture that’s allowed Shopify Core to scale
  • Building resiliency tooling to automatically generate resiliency matrices, and improve Toxiproxy to make creating resilient applications a breeze
  • Moving shop data between shards with minimum disruption for the customer to improve data locality, resiliency, and performance for our merchants
  • Designing the RPC layer that makes talking between our 100s of internal applications a joy, setting up a service mesh to provide circuit breakers to everyone, and enable Chaos Engineering
  • Failing over shards between datacenters without losing requests
  • Building the tools to make refactoring data at scale easier: traversing billions of records across 100s of Pods without a hitch

You’ll need to have experience with:

  • Building backend web services using several languages and frameworks (some tools we use include Ruby, UNIX commands, Go, Kafka, Python, …)
  • Working with relational databases and SQL
  • Working with web frameworks or the desire to learn it quickly
  • Linux and systems knowledge, should be comfortable navigating production infrastructure
  • Comfortable digging deep into problems on your own. Always hungry to answer another “why.”

It’d be amazing if you have experience with:

  • Experience building resilient, scalable services (with tools like Toxiproxy) and concepts like SLA, fault tolerance, circuit breakers ring a bell
  • Experience with development on a leading cloud provider (GCE, AWS, Azure, …)
  • Experience understanding and working with the lower levels of relational databases (Binlog, Topology, Performance Optimization, Replication)
  • Experience with reasoning about and working with distributed systems (Consensus algorithms like Raft/Paxos, 2PC, ACID ..)
  • Experience operating infrastructure, debugging production systems, and being on-call
  • Experience with concurrent programming (optimistic/pessimistic, semaphores, deadlocks)

Our team has spoken at conferences around the world about the work that we’re doing:

 

About Shopify

Shopify is a leading cloud-based, multichannel commerce platform designed for small and medium-sized businesses. Merchants can use the software to design, set up and manage their stores across multiple sales channels, including web, mobile, social media such as Pinterest and Facebook, brick-and-mortar locations, and pop-up shops. The platform also provides a merchant with a powerful back-office and a single view of their business.

The Shopify platform was engineered for reliability and scale, using enterprise-level technology made available to businesses of all sizes. Shopify currently powers over 200,000 businesses in approximately 150 countries, including: Tesla Motors, Budweiser, Wikipedia, LA Lakers, the New York Stock Exchange, GoldieBlox, and many more.

Your personal growth is important to us, and we’ll give you everything you need to make it happen: learning budgets, mentorship opportunities, one-on-one coaching, skill development workshops, you name it. We encourage you to experiment, take risks, and pursue the things you care about. And if you make a mistake? That’s ok – learn from it, and share your experience with the team.

We hope you’ll love it here, but we also know that it’s not all about work. We’ll help you maintain a healthy balance with a gym allowance, parental leave, childcare benefits, flexible work hours, and catered meals to give you more time for the things you care about most.

We’re growing quickly, so there are plenty of opportunities to learn and grow. You’ll have the creative freedom to make a real difference in the world of commerce, and the chance to work with some of the best in the business.