Overview

Sonatype is the software supply chain management company. We’re on a mission to change how the world innovates by making software development easier. From running the world’s largest repository of Java open-source components (Maven Central) to inventing componentized software development and then software supply chain management to creating the only solution that stops malicious open-source malware in its tracks, we’re constantly leading the industry while helping thousands of customers manage open source every day.

Already used by 15 million developers, we have lofty goals for our technology to be in the hands of every engineering team. And we need you to do that. Join us!

Learn more at www.sonatype.com.

Sonatype’s mission is to enable organizations to better manage their software supply chain.  We offer a series of products and services including the Sonatype Nexus Repository and Sonatype Lifecycle.

*** This position is 100% remote. ***

You’ll be working with one of our sophisticated research teams to help turn large amounts of data into valuable insights for our customers. We’re building our data science program so you’ll be helping to build out our standard processes as we grow. We have a large team of dedicated data engineers and data scientists so you can focus on doing what you do best, building models.

What You’ll Be Doing

  • Interacting with product management and data engineers to think through the potential ways to leverage data.
  • It is encouraged that you are an authority in machine learning so you will largely be driving the direction of your work since you know best what is possible.
  • Assure quality of models you’re producing and supervising them over time
  • Lead the research, development, and deployment of machine learning models for malicious behavioral analysis detection, demonstrating innovative techniques such as GANs, VAEs, and generative AI..
  • Collaborate closely with multi-functional teams, including data engineers, software developers, and domain authorities, to identify business requirements and translate them into practical data science solutions.
  • Explore and evaluate different generative AI approaches and algorithms to detect and predict malicious activities, anomalies, and behavioral patterns in diverse datasets.
  • Design and implement scalable and efficient data processing pipelines to collect, cleanse, and preprocess large-scale datasets for training and validation purposes.
  • Develop and implement feature engineering strategies, dimensionality reduction techniques, and data augmentation methods to improve the performance and generalization capabilities of the models.
  • Conduct in-depth exploratory data analysis and develop statistical models to identify patterns, correlations, and trends in data related to fraud and behavioral patterns.
  • Collaborate with the data governance team to ensure compliance with data privacy regulations and ethical considerations while working with critical customer data.
  • Stay updated on the latest research and advancements in generative AI, fraud detection, and behavioral analysis domains, and evaluate their applicability to enhance our existing models and methodologies.
  • Mentor and provide guidance to junior data scientists, assisting them in developing their technical skills and understanding of AI capabilities.
  • Present findings, insights, and model performance to both technical and non-technical partners, successfully communicating sophisticated concepts in a clear and concise manner.
  • Excellent problem-solving abilities and the capacity to develop innovative solutions to sophisticated data science challenges
  • Strong grasp of robust model validation techniques, including cross-validation and evaluation metrics suitable for assessing generalization performance.
  • Confirmed ability to implement data science standard methodologies
  • Proficiency using Jupyter or Databricks notebooks

Requirements

  • Strong academic credentials in computer science, statistics, data science, machine learning or a related field.
  • 8+ years of hands-on experience as a data scientist.
  • Strong expertise in generative modeling, and deep learning architectures.
  • Thorough quantitative background.
  • Shown understanding of fraud detection techniques, anomaly detection, and behavioral analysis.
  • Proficiency in programming languages such as Python or R, and experience with relevant libraries and frameworks (e.g., TensorFlow, Keras, PyTorch, ScikitLearn).
  • Shown experience in working with large-scale datasets, data preprocessing, and feature engineering.

Preferences

  • Familiarity with Databricks, AWS, S3, EMR, Sagemaker, would be beneficial
  • Experience with Git and preferably Github
  • PySpark, MLflow, LangChain, HuggingFace APIs
  • Our data engineers primarily use Java and Scala. We don’t expect you to be writing Java/Scala code, but familiarity may make it easier to work with the Data Engineers.

Things that we are proud of

  • 2023 Fast Company Best Places for Innovators
  • 2023 Leader in Forrester-Wave for Software Compensation Analysis
  • 2023 Gartner’s Magic Quadrant
  • 2023 Software Report’s Top 100 Software Companies
  • 2023 BuiltIn Best Places to Work
  • 2022 Frost & Sullivan Technology Innovation Leader Award
  • 2022 PeerSpot Silver Peer Award in Software Composition Analysis
  • 2022 Tech Ascension Best DevOps Security Solution Award
  • 2022 NVCT Cyber Company of the Year
  • Company Wellness Week – We shut down company operations for a week to enable all employees to spend time pursuing personal growth and enjoying much needed and deserved rest.
  • Diversity & Inclusion Working Groups
  • Parental Leave Policy
  • Paid Volunteer Time Off (VTO)

$0 – $1 a year

We are Sonatype, and we have assembled a world class team of employees, investors, and partners. We are proud to be recognized as a Deloitte Technology Fast 500 company for 2016. With more than 120,000 installations and counting, Nexus products are helping modern development organizations thoughtfully source, manage, assemble, and maintain open source and third-party components, so they can improve the quality, security, and speed of their software supply chains.

We are curious and constantly innovating without fear of failure. We are digging into a huge and emerging market and seeking remarkably versatile individuals to join us on our journey.

Sonatype is proud to be an equal opportunity workplace and an affirmative action employer that is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please do not hesitate to let us know.

At Sonatype, we value diversity and inclusivity. We offer perks such as parental leave, diversity and inclusion working groups, and flexible working practices to allow our employees to show up as their whole selves. We are an equal-opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. If you have a disability or special need that requires accommodation, please do not hesitate to let us know.