Product Overview
Disdat is a Python (3.6.8+) package (on PyPi) for data versioning and pipeline authoring that allows data scientists to create, share, and track data products. Disdat organizes data into bundles, collections of literal values and files produced by data science tasks, such as data cleaning, model training, or prediction. Bundles are the unit at which data is versioned and shared. Disdat manages data produced by data science pipelines so you don't have to. Instead of managing custom naming taxonomies for each project, such as "models/2-1-18/made-with-1-1-17-data.parquet", Disdat manages the outputs in your local FS or S3 for you.
Not specified