Our product infrastructure team is looking for a data engineer to reform the Wikimedia Foundation’s approach to data instrumentation, so that we can continuously learn how to make Wikipedia and its sister projects more useful. This is a great position for you if you love instrumenting the full stack and cherish the opportunity to help software engineers, data scientists, and product managers through better data engineering tools and techniques. You will play an important role in improving our data insight capabilities, nurturing an even more data informed culture in our products and programs.

You will have the opportunity to work with software that serves over half a billion pages per day through Wikipedia (and other Wikimedia sites), build components that operate in a privacy first manner, and modernize code and practices that support experimentation, measurement, monitoring & alerting, and reporting in our product department.

Are you ready to write open source software that operates on data at the scale and speed of Wikipedia traffic, innovate in applied data science that respects user privacy, and help create a world in which everyone can freely share in the sum of all knowledge?

We are a family friendly employer and have a great benefits package. We look forward to your application.

You will do these things:

  • Develop an overall approach to product instrumentation that fits the Wikimedia Foundation’s unique goals and responsibilities to our communities
  • Clearly and concisely communicate complex ideas about software, instrumentation, and statistics
  • Author polished code that makes it possible to generate and validate data, using a number of languages in contexts such as A/B testing and the data pipeline – Python, JavaScript, PHP, Java, Scala and more
  • Work with product analysts to integrate incoming data into ad hoc analytics tasks and automated analytics workflows
  • Coach web and native iOS & Android software engineers on effective use of instrumentation and other data tooling
  • Build up a collection of repeatable patterns and practices and ensure that data engineering components and architecture will scale sustainably
  • Partner with our Analytics Engineering team and Product Analytics team on enhancements to big data architecture

We would like you to have these skills:

  • 3+ years experience working with data in a professional environment
  • 2+ years building web and apps involving instrumentation components, frameworks, and data models.
  • Experience with Hadoop (HDFS, Hive, YARN, MapReduce) and later technology like Spark
  • Experience with NoSQL databases like Cassandra or Druid and “conventional” SQL ones like MySQL
  • Experience with data friendly languages like Python, R, Scala, or Hadoop-flavored Java
  • Application server experience such as PHP or Node.js
  • Focused software engineering: you enjoy writing unit tests, reviewing code and responding to code reviews, and discussing architectural approach
  • A bachelor’s, master’s, or doctorate in data science, computer science, or other scientific field; or equivalent experience.
  • A love of free knowledge and open access

These are also pluses:

  • Exposure to applied machine learning (ML), deep learning, natural language processing (NLP)
  • Native apps programming experience (e.g., Objective-C, Swift, Android Java / Android SDK, Kotlin)
  • Experience with Jupyter, Mathematica, or similar notebook technology
  • Professional use of distributed and in-memory databases
  • Application of information security to big data
  • Experience with an internet software environment operating at scale. For example: messaging platforms that process hundreds of thousands of events per second.
  • Track record of working remotely with teams distributed across many time zones
  • Examples of connecting software to CI tools like Jenkins or Travis
  • Contributions to Wikipedia (or editing wikitext on other wikis) and other open access / open source ecosystems


If you have any existing open source software that you’ve developed (these could be your own software or patches to other packages), please share the URLs for the source. Links to your projects on GitHub, GitLab, BitBucket, etc. are exceptionally useful.

The Wikimedia Foundation is… 

…the nonprofit organization that supports Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge, free of interference. We host the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive financial support from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

The Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply

Benefits & Perks *

  • Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)
  • The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, baby sitting, continuing education and much more
  • The 401(k) retirement plan offers matched contributions at 4% of annual salary
  • Flexible and generous time off – vacation, sick and volunteer days, plus 19 paid holidays – including the last week of the year.
  • Family friendly! 100% paid new parent leave for seven weeks plus an additional five weeks for pregnancy, flexible options to phase back in after leave, fully equipped lactation room.
  • For those emergency moments – long and short term disability, life insurance (2x salary) and an employee assistance program
  • Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses
  • Telecommuting and flexible work schedules available
  • Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax
  • Great colleagues – diverse staff and contractors speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people