Ballista is a proof-of-concept distributed compute platform based on Kubernetes and the Rust implementation of Apache Arrow.
This is not my first attempt at building something like this. I originally wanted DataFusion to be a distributed compute platform but this was overly ambitious at the time, and it ended up becoming an in-memory query execution engine for the Rust implementation of Apache Arrow. However, DataFusion now provides a good foundation to have another attempt at building a modern distributed compute platform in Rust.
My goal is to use this repo to move fast and try out ideas that eventually can be contributed back to Apache Arrow and to help drive requirements for Apache Arrow and DataFusion.
This demo shows a Ballista cluster being created in Minikube and then shows the nyctaxi example being executed, causing a distributed query to run in the cluster, with each executor pod performing a projection on one partition of the data.
Here are the commands being run, with some explanation:
```bash
cargo run --bin ballista -- create-cluster --name nyctaxi --num-executors 12 --template examples/nyctaxi/templates/executor.yaml
kubectl get pods
cargo run --bin ballista -- run --name nyctaxi --template examples/nyctaxi/templates/application.yaml
kubectl get pods
kubectl logs -f ballista-nyctaxi-app-n5kxl ```