xvc

codecov build crates.io docs.rs unsafe forbidden

A Fast and Robust MLOps Swiss-Army Knife in Rust

โŒ› When to use xvc?

โœณ๏ธ What is xvc for?

๐Ÿ”ฝ Installation

You can get the binary files for Linux, macOS and Windows from releases page. Extract and copy the file to your $PATH.

Alternatively, if you have Rust [installed], you can build xvc:

shell $ cargo install xvc

๐Ÿƒ๐Ÿพ Quick Start

Xvc tracks your files and directories on top of Git. To start run the following command in the repository.

shell $ xvc init

It initializes the metafiles in .xvc/ directory and adds .xvcignore file in case you want to hide certain elements from Xvc.

Add your data files and directories for tracking.

shell $ xvc file track my-data/ $ git add .xvc $ git commit -m "Began to track my-data/ with Xvc" $ git push

The command calculates data content hashes (with BLAKE-3, by default) and records them. It also copies files to content addressed directories under .xvc/b3

Define a file storage to share the files you added.

shell $ xvc storage new s3 --name my-remote --region us-east-1 --bucket-name my-xvc-remote

You can push the files you added to this remote.

shell $ xvc file push --to my-remote

You can now delete the files.

shell $ rm -r my-data/

When you want to access this data later, you can clone the repository and get back the files from file storage.

shell $ xvc file pull my-data/

If you have commands that depend on data or code elements, Xvc allows to define steps to its default pipeline.

shell $ xvc pipeline step new --name my-data-update --command 'python3 preprocess.py' $ xvc pipeline step dependency --step my-data-update --files my-data/ \ --files preprocess.py \ --regex 'names.txt:/^Name:' \ --lines a-long-file.csv::-1000 $ xvc pipeline step output --step-name my-data-update --output-file preprocessed-data.npz

The above commands define a new step in the default pipeline that depends on files in my-data/ directory, and preprocess.py; lines that start with Name: in names.txt; and the first 1000 lines in a-long-file.csv. When any of these change, or the output is missing, the step command (python3 preprocess.py) will run.

shell $ xvc pipeline run

If none of the dependencies change, and the output is available the above command will do nothing.

You can define fairly complex dependencies with globs, files, directories, regular expression searches in files, lines in files, other steps and pipelines with xvc pipeline step dependency commands. More dependency types like database queries, content from URLs, S3 (and compatible) buckets, REST and GraphQL results are in my mental backlog.

Please check xvc.netlify.app for documentation.

๐ŸคŸ Big Thanks

xvc stands on the following (giant) crates:

And, biggest thanks to Rust designers, developers and contributors. Although I can't see myself expert to appreciate it all, it's a fabulous language and environment to work with.

๐Ÿš Support

๐Ÿ‘ Contributing

โš ๏ธ Disclaimer

This software is fresh and ambitious. Although I use it and test it close to real world conditions, it didn't go under test of time. Xvc can eat your files and spit them to eternal void!