Overview
Paperpile runs on data at scale, with a literature database of 250M+ academic papers and a growing body of user data accumulated over more than a decade. You’ll work across the systems that ingest, process, store, and serve this data reliably: building pipelines, optimizing search, handling PDFs at scale, and exposing clean APIs.
Requirements
– Strong backend engineering background with experience building and operating data-heavy systems in production.
– Experience deploying and operating services on AWS.
– Experience designing and maintaining data ingestion pipelines handling messy, heterogeneous sources. Comfortable with web scraping and working with third-party data sources and APIs.
– Familiarity with Node.js and TypeScript. Itโs fine if you come from a different background, such as Java or Python, but you should be comfortable working in this environment.
– High standards for data quality. You think carefully about correctness, deduplication, and consistency.
– Solid understanding of full-text search systems including indexing strategy, relevance tuning, and query optimization.
– Proficient in building reliable REST APIs.
More useful experience
– Familiarity with academic publishing formats and data sources (PubMed, Crossref, arXivโฆ)
– Experience with PDF processing pipelines (extraction, transformation, storage and delivery at scale).
– Experience with LLM-based document processing or ML pipelines for extracting structured data from unstructured text.
– Large scale web crawling and scraping.
Benefits
– Base compensation โฌ60,000โโฌ90,000 based on the level of your experience
– Bonus/equity program.
– 4 weeks paid vacation + local holidays.
– We sponsor co-working space in your city.
– Learn and grow. Try out new things. We sponsor relevant courses, seminars, and conferences.