This project houses the experimental client for Spark Connect for Apache Spark written in Rust
Currently, the Spark Connect client for Rust is highly experimental and should not be used in any production setting. This is currently a "proof of concept" to identify the methods of interacting with Spark cluster from rust.
The spark-connect-rs
aims to provide an entrypoint to Spark Connect, and provide similar DataFrame API interactions.
bash
docker compose up --build -d
```rust use sparkconnectrs;
use sparkconnectrs::{SparkSession, SparkSessionBuilder};
async fn main() -> Result<(), Box
let mut df = spark.sql("SELECT * FROM json.`/opt/spark/examples/src/main/resources/employees.json`");
df.filter("salary > 3000").show(Some(5), None, None).await?;
} ```
``` git clone https://github.com/sjrusso8/spark-connect-rs.git git submodule update --init --recursive
docker compose up --build -d
cargo build && cargo test ```
The following section outlines some of the implemented functions that are working with the Spark Connect session
| SparkSession | API | Comment | |------------------|---------|------------------------------------------------------------------------------| | range | ![done] | | | sql | ![done] | Does not include the new Spark Connect 3.5 feature with "position arguments" | | read | ![done] | | | createDataFrame | ![open] | | | getActiveSession | ![open] | | | many more!! | | |
| DataFrame | API | Comment | |-----------------|---------|------------------------------------------------------------------------------| | select | ![done] | | | selectExpr | ![done] | Does not include the new Spark Connect 3.5 feature with "position arguments" | | filter | ![done] | | | createTempView | ![done] | There is an error right now, and the functions are private till it's fixed | | show | ![done] | | | tail | ![open] | | | withColumns | ![open] | | | drop | ![open] | | | sort | ![open] | | | groupBy | ![open] | | | many more! | ![open] | |