Pinterest: Scalable Data & MLOps Platform

A detailed look at the challenges and successes of Pinterest.

Pinterest: Scalable Data & MLOps Platform

Key Metrics at a Glance

1000+/week
Model Deployments

Rapid iteration and deployment of new ML models.

The Problem in Detail: How Did It Come to This?

Pinterest's data and ML infrastructure was fragmented and couldn't keep up with the rapid growth in data volume and number of models. There were no standardized ways to process data and reliably bring models into production.

The Solution: A Strategic Approach

Pinterest built a centralized data platform based on technologies like Apache Kafka, Spark, and Airflow. For MLOps, they developed an internal platform standardizing the entire model lifecycle, from feature creation to training to deployment and monitoring. A central Feature Store ensures data consistency for training and inference.

Key Learnings

  • A centralized platform is crucial for scaling Data Science and ML at large.
  • Treating data and ML pipelines as code (DataOps/MLOps) is key to reproducibility and reliability.
  • A Feature Store is a critical component to avoid 'Training-Serving Skew'.

Essential Questions & Answers

Technologies & Concepts Used:

DataOps
MLOps
AWS EMR
Kubernetes
Airflow
Feature Store
CI/CD for ML