DataFusion Command-line Interface

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

The DataFusion CLI allows SQL queries to be executed by an in-process DataFusion context.

```ignore USAGE: datafusion-cli [OPTIONS]

OPTIONS: -c, --batch-size The batch size of each query, or use DataFusion default -f, --file ... Execute commands from file(s), then exit --format [default: table] [possible values: csv, tsv, table, json, nd-json] -h, --help Print help information -p, --data-path Path to your data, default to current directory -q, --quiet Reduce printing other than the results and work quietly -r, --rc ... Run the provided files on startup instead of ~/.datafusionrc -V, --version Print version information

```

Example

Create a CSV file to query.

bash,ignore $ echo "1,2" > data.csv

```sql,ignore $ datafusion-cli

DataFusion CLI v12.0.0

CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv'; 0 rows in set. Query took 0.001 seconds.

SELECT * FROM foo; +---+---+ | a | b | +---+---+ | 1 | 2 | +---+---+ 1 row in set. Query took 0.017 seconds. ```

Querying S3 Data Sources

The CLI can query data in S3 if the following environment variables are defined:

Note that the region must be set to the region where the bucket exists until the following issue is resolved:

Example:

```bash $ aws s3 cp test.csv s3://my-bucket/ upload: ./test.csv to s3://my-bucket/test.csv

$ export AWSREGION=us-east-1 $ export AWSSECRETACCESSKEY=******** $ export AWS_ACCESS_KEY_ID=***

$ ./target/release/datafusion-cli DataFusion CLI v12.0.0 ❯ create external table test stored as csv location 's3://my-bucket/test.csv'; 0 rows in set. Query took 0.374 seconds. ❯ select * from test; +----------+----------+ | column1 | column2 | +----------+----------+ | 1 | 2 | +----------+----------+ 1 row in set. Query took 0.171 seconds. ```

DataFusion-Cli

Build the datafusion-cli by cd into the sub-directory:

bash cd datafusion-cli cargo build