DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
The DataFusion CLI allows SQL queries to be executed by an in-process DataFusion context, or by a distributed Ballista context.
```ignore USAGE: datafusion-cli [FLAGS] [OPTIONS]
FLAGS: -h, --help Prints help information -q, --quiet Reduce printing other than the results and work quietly -V, --version Prints version information
OPTIONS:
-c, --batch-size
Create a CSV file to query.
bash,ignore
$ echo "1,2" > data.csv
```sql,ignore $ datafusion-cli
DataFusion CLI v7.0.0
CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv'; 0 rows in set. Query took 0.001 seconds.
SELECT * FROM foo; +---+---+ | a | b | +---+---+ | 1 | 2 | +---+---+ 1 row in set. Query took 0.017 seconds. ```
Build the datafusion-cli
without the feature of ballista.
bash
cd arrow-datafusion/datafusion-cli
cargo build
If you want to execute the SQL in ballista by datafusion-cli
, you must build/compile the datafusion-cli
with features of "ballista" first.
bash
cd arrow-datafusion/datafusion-cli
cargo build --features ballista
The DataFusion CLI can connect to a Ballista scheduler for query execution.
bash
datafusion-cli --host localhost --port 50050