lakeFS

lakeFS
lakeFS
	Error creating thumbnail: File missing
Original authors	Einat Orr; Oz Katz
Developer	Treeverse
Initial release	August 3, 2020
Stable release	1.72.0
Repository	https://github.com/treeverse/lakeFS
Written in	Go
Engine	Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
Type	Data version control
License	Apache 2.0
Website	lakefs.io

lakeFS is a data version control system designed as an enterprise data infrastructure for data engineering and AI teams.^[1] It brings Git-like operations — branching, committing, merging, and reverting — to large-scale data stored in object storage systems such as Amazon S3, Azure Blob Storage, Google Cloud Storage, and any other S3-compatible object storage.^[2] lakeFS is used for multimodal data management, including data quality enforcement, reproducibility, and governance across data lakes and machine learning workflows.^[3] lakeFS is available as an open-source project, an enterprise platform and as a managed service (lakeFS Cloud).^[3]^[1]

History

lakeFS was created in 2020 by Einat Orr and Oz Katz at Treeverse.^[4] Its first public release, v0.8.1, appeared in August 2020 and introduced Git-style operations and Amazon S3 compatibility.^[5] In 2021, Treeverse raised $23 million in a Series A funding round led by Dell Technologies Capital, Norwest Venture Partners, and Zeev Ventures.^[6]

In 2021, lakeFS was included in InfoWorld’s Best of Open Source Software (Bossie) awards.^[7]

In June 2022, lakeFS introduced lakeFS Cloud, a managed service extending version control to cloud-based data lakes.^[3] Version 1.0 was released in October 2023, marking the transition to production-grade status and adding integrations with Databricks, Apache Iceberg, and orchestration tools such as Apache Airflow.^[1]^[8] Independent reports mention enterprise users such as Microsoft, Volvo, and NASA.^[1]

In July 2025, lakeFS secured an additional $20 million in growth capital to expand its enterprise data infrastructure for AI workloads.^[9]^[10]

In November 2025, lakeFS announced the acquisition of the open source project DVC. ^[11]

Software

Overview

lakeFS enables Git-like operations — branching, committing, merging, and reverting — to datasets stored in object storage. These operations allow teams to manage the data lifecycle with the same rigor as software development applies to managing code: testing changes in isolation, auditing modifications before production to catch issues, reproducing data states, and recovering quickly from errors or incidents.^[2]^[1]

Architecture

lakeFS acts as a layer on top of object stores such as Amazon S3, Azure Blob Storage, and Google Cloud Storage. It maintains repository metadata to record commits, branches, and tags, enabling zero-copy data versioning and isolation. lakeFS exposes multiple interfaces, including a web UI, CLI, REST API, and SDKs, allowing integration with existing data engineering and machine learning workflows. It integrates with the modern data stack, supporting query engines, orchestration tools, and data processing frameworks without requiring changes to existing storage layouts.^[2]^[3]

Functions

lakeFS provides version control and data management capabilities for object storage–based data lakes. Its core capabilities include:

Version control and atomic commits: enables reproducible data versions and complete lineage tracking across repositories.^[1]
Zero-copy branching and merging: allows creating isolated branches for development and testing without duplicating data.^[2]
Automated hooks: supports configurable hooks that validate data quality or trigger external workflows before or after merges and commits.^[1]
Rollback and recovery: allows reverting a repository to any previous commit to recover from data errors.^[2]
Data lineage and metadata management: records commit history and metadata changes for auditability.^[3]
Multi-storage support: allows managing data across multiple storage systems from one instance, compatible with major object storage systems such as Amazon S3, Azure Blob Storage, Google Cloud Storage, and MinIO.^[3]
Reproducibility: enables reproducing experiments and model training based on fixed data versions.^[1]

Integrations

Independent coverage has noted integrations with Databricks, Apache Iceberg, Red Hat OpenShift and Trino, as well as compatibility with orchestration tools such as Apache Airflow.^[1]^[2] Independent materials also describe usage with Trino, including pre-merge validation patterns in versioned data workflows.^[12]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b ^c ^d ^e ^f Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b ^c ^d ^e ^f Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[VentureBeat2023-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Blocks2023-2] ^ ^a ^b ^c ^d ^e ^f Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[TechTarget2022-3] ^ ^a ^b ^c ^d ^e ^f Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Calcalist2021-4] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[5] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[6] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[7] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[8] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[9] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[10] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[11] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[12] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

lakeFS

Contents

History

Software

Overview

Architecture

Functions

Integrations

See also

References

Navigation menu

lakeFS
Error creating thumbnail: File missing
Original authors	Einat Orr Oz Katz
Developer	Treeverse
Initial release	August 3, 2020

Stable release	1.72.0

Repository	https://github.com/treeverse/lakeFS
Written in	Go
Engine	Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
Type	Data version control
License	Apache 2.0
Website	lakefs.io

lakeFS

History

Software

Overview

Architecture

Functions

Integrations

See also

References

Navigation menu

Search