A Fast and Robust MLOps Swiss-Army Knife in Rust
make
alternative that considers content changes rather than timestamps.You can get the binary files for Linux, macOS and Windows from releases page. Extract and copy the file to your $PATH.
Alternatively, if you have Rust [installed], you can build xvc:
shell
$ cargo install xvc
Xvc tracks your files and directories on top of Git. To start run the following command in the repository.
shell
$ xvc init
It initializes the metafiles in .xvc/
directory and adds .xvcignore
file in case you want to hide certain elements from Xvc.
Add your data files and directories for tracking.
shell
$ xvc file track my-data/
$ git add .xvc
$ git commit -m "Began to track my-data/ with Xvc"
$ git push
The command calculates data content hashes (with BLAKE-3, by default) and records them.
It also copies files to content addressed directories under .xvc/b3
Define a file storage to share the files you added.
shell
$ xvc storage new s3 --name my-remote --region us-east-1 --bucket-name my-xvc-remote
You can push the files you added to this remote.
shell
$ xvc file push --to my-remote
You can now delete the files.
shell
$ rm -r my-data/
When you want to access this data later, you can clone the repository and get back the files from file storage.
shell
$ xvc file pull my-data/
If you have commands that depend on data or code elements, Xvc allows to define steps to its default pipeline.
shell
$ xvc pipeline step new --name my-data-update --command 'python3 preprocess.py'
$ xvc pipeline step dependency --step my-data-update --files my-data/ \
--files preprocess.py \
--regex 'names.txt:/^Name:' \
--lines a-long-file.csv::-1000
$ xvc pipeline step output --step-name my-data-update --output-file preprocessed-data.npz
The above commands define a new step in the default
pipeline that depends on files in my-data/
directory, and preprocess.py
; lines that start with Name:
in names.txt
; and the first 1000 lines in a-long-file.csv
. When any of these change, or the output is missing, the step command (python3 preprocess.py
) will run.
shell
$ xvc pipeline run
If none of the dependencies change, and the output is available the above command will do nothing.
You can define fairly complex dependencies with globs, files, directories, regular expression searches in files, lines in files, other steps and pipelines with xvc pipeline step dependency
commands. More dependency types like database queries, content from URLs, S3 (and compatible) buckets, REST and GraphQL results are in my mental backlog.
Please check xvc.netlify.app for documentation.
xvc stands on the following (giant) crates:
xvc-ecs
] for serializing components in an ECS with a single line of code.xvc storage new s3
, and all these work with minimum bugs thanks to [clap].xvc pipeline run
. A DAG composed of State Machines made running pipeline steps in parallel with a clean separation of process states.And, biggest thanks to Rust designers, developers and contributors. Although I can't see myself expert to appreciate it all, it's a fabulous language and environment to work with.
workflow_tests
crate.This software is fresh and ambitious. Although I use it and test it close to real world conditions, it didn't go under test of time. Xvc can eat your files and spit them to eternal void!