Manage data the way code is managed
DVC provides Git-based data version control that bridges the gap between software engineering practices and data science workflows. Now part of the lakeFS family, the platform offers solutions ranging from lightweight open-source tools for individual practitioners to enterprise-grade infrastructure for large-scale AI operations.

DVC (Data Version Control) is an open-source tool that brings software engineering best practices to data science and machine learning workflows. Using a Git-like model, DVC enables teams to manage data, models, and experiments with the same rigor and reproducibility that developers apply to code versioning. The platform serves both individual data scientists working on smaller projects and enterprise AI engineering teams handling complex operations. Originally developed by Iterative, DVC has now joined the lakeFS family, expanding its ecosystem for data version control solutions. The platform offers multiple deployment options including a free open-source Git extension, a VS Code extension for seamless IDE integration, and enterprise-grade solutions through lakeFS for organizations requiring highly scalable infrastructure capable of handling petabyte-scale multimodal object stores and data lakes. With over 15,000 GitHub stars and a thriving community, DVC has established itself as a standard for data versioning and reproducibility. The platform empowers data science and ML teams to apply version control principles to their data workflows, enabling better collaboration, experiment tracking, and reproducible machine learning pipelines.