Data Lake

< Back to category

Definition

A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale.
Source: Amazon

Contribute to this definition

Products

The Amorphic Data Unified Analytic Platform is a fully managed self-service data lake platform to simplify AWS analytics for IT and all users. The Amorphic Data Unified Analytic Platform is offered as both a Software-as-a-Service and a Managed Subscr...
  Compare
Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table.
  Compare
Apache Orc is columnar storage for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the va...
  Compare
Azure Blob Storage is massively scalable and secure object storage for cloud-native workloads, archives, data lakes, high-performance computing, and machine learning. Azure Blob Storage helps you create data lakes for your analytics needs, and provid...
  Compare
By 1plusX
Our custom data lake allows your data teams to deep-dive into and fully understand your data.
  Compare
The Lakehouse Platform combines the best elements of data lakes and data warehouses — delivering data management and performance typically found in data warehouses with the low-cost, flexible object stores offered by data lakes. This unified platform...
  Compare
Databricks SQL allows customers to operate a multi-cloud lakehouse architecture that provides data warehousing performance at data lake economics.
  Compare
By Dremio
Dremio is a next-generation data lake engine that liberates your data with live, interactive queries directly on cloud data lake storage. Dremio delivers secure, self-service data access and lightning-fast queries directly on your AWS, Azure or priva...
  Compare
lakeFS enables you to manage your data lake the way you manage your code. Run parallel pipelines for experimentation and CI/CD for your data.
  Compare
By Dremio
Nessie is a data ops solution for data lakes. Nessie builds on top of and integrates with Apache Iceberg, Delta Lake and Hive. It was designed from day one to run at massive scale in the cloud, supporting millions of tables referencing exabytes of da...
  Compare
Finding more products...
Stay on top of the latest industry technology announcements with our weekly newsletter