A rust library to query Apache Pinot.
To install Pinot locally, please follow this Pinot Quickstart link to install and start Pinot batch quickstart locally.
bin/quick-start-batch.sh
Alternatively, the docker contained Pinot database ochestrated by this repository's docker-compose.yaml
file may be used.
bash
make prepare-pinot
Check out Client library Github Repo
bash
git clone git@github.com:yougov/pinot-client-rust.git
cd pinot-client-rust
Start up the docker contained pinot database
base
make prepare-pinot
Build and run an example application to query from Pinot
bash
cargo run --example pql-query
cargo run --example sql-query-deserialize-to-data-row
cargo run --example sql-query-deserialize-to-struct
Pinot client could be initialized through:
rust
let client = pinot_client_rust::connection::client_from_zookeeper(
&pinot_client_rust::zookeeper::ZookeeperConfig::new(
vec!["localhost:2181".to_string()],
"/PinotCluster".to_string(),
),
None
);
rust
let client = pinot_client_rust::connection::client_from_broker_list(
vec!["localhost:8099".to_string()], None);
An asynchronous connection can be established with pinot_client_rust::async_connection::AsyncConnection
for
which exist equivalents to the above described synchronous instantiation methods.
Please see this example for your reference.
Code snippet:
rust
fn main() {
let client = pinot_client_rust::connection::client_from_broker_list(
vec!["localhost:8099".to_string()], None).unwrap();
let broker_response = client.execute_sql::<pinot_client_rust::response::data::DataRow>(
"baseballStats",
"select count(*) as cnt, sum(homeRuns) as sum_homeRuns from baseballStats group by teamID limit 10"
).unwrap();
if let Some(stats) = broker_response.stats {
log::info!(
"Query Stats: response time - {} ms, scanned docs - {}, total docs - {}",
stats.time_used_ms,
stats.num_docs_scanned,
stats.total_docs,
);
}
}
Query Responses are defined by one of two broker response structures.
SQL queries return SqlResponse
, whose generic parameter is supported by all structs implementing the
FromRow
trait, whereas PQL queries return PqlResponse
.
SqlResponse
contains a Table
, the holder for SQL query data, whereas PqlResponse
contains
AggregationResults
and SelectionResults
, the holders for PQL query data.
Exceptions for a given request for both SqlResponse
and PqlResponse
are stored in the Exception
array.
Stats for a given request for both SqlResponse
and PqlResponse
are stored in ResponseStats
.
Exception
is defined as:
```rust /// Pinot exception.
pub struct PinotException { #[serde(rename(deserialize = "errorCode"))] pub error_code: i32, pub message: String, } ```
ResponseStats
is defined as:
```rust /// ResponseStats carries all stats returned by a query.
pub struct ResponseStats {
pub traceinfo: HashMap
PqlResponse
is defined as:
```rust /// PqlResponse is the data structure for broker response to a PQL query.
pub struct PqlResponse {
pub aggregationresults: Vec
SqlResponse
is defined as:
```rust /// SqlResponse is the data structure for a broker response to an SQL query.
pub struct SqlResponse ```rust
/// Table is the holder for SQL queries. pub struct Table ```rust
/// Schema is response schema with a bimap to allow easy name <-> index retrieval pub struct Schema {
columndatatypes: Vec There are multiple functions defined for ```rust
/// Pinot native types pub enum DataType {
Int,
Long,
Float,
Double,
Boolean,
Timestamp,
String,
Json,
Bytes,
IntArray,
LongArray,
FloatArray,
DoubleArray,
BooleanArray,
TimestampArray,
StringArray,
BytesArray,
}
``` In addition to being implemented by `` /// Converts Pinot timestamps into /// Converts Pinot hex strings into /// Converts Pinot hex string into /// Deserializes json potentially packaged into a string by calling For example usage, please refer to this example `` pub struct DataRow {
row: Vec,
}
``` ```rust
/// Typed Pinot data pub enum Data {
Int(i32),
Long(i64),
Float(f32),
Double(f64),
Boolean(bool),
Timestamp(DateTime There are multiple functions defined for In addition to row count, >,
pub stats: Option
Table
is defined as:[derive(Clone, Debug, PartialEq)]
Schema
is defined as:[derive(Clone, Debug, Eq, PartialEq)]
Schema
, like:
fn get_column_count(&self) -> usize;
fn get_column_name(&self, column_index: usize) -> Result<&str>;
fn get_column_index(&self, column_name: &str) -> Result<usize>;
fn get_column_data_type(&self, column_index: usize) -> Result<DataType>;
fn get_column_data_type_by_name(&self, column_name: &str) -> Result<DataType>;
DataType
is defined as:[derive(Clone, Debug, Eq, PartialEq)]
FromRow
is defined as:rust
/// FromRow represents any structure which can deserialize
/// the Table.rows json field provided a `Schema`
pub trait FromRow: Sized {
fn from_row(
data_schema: &Schema,
row: Vec<Value>,
) -> std::result::Result<Self, serde_json::Error>;
}
DataRow
, FromRow
is also implemented by all implementors
of serde::de::Deserialize
, which is achieved by first deserializing the response to json and then
before each row is deserialized into final form, a json map of column name to value is substituted.
Additionally, there are a number of serde deserializer functions provided to deserialize complex pinot types:
/// Converts Pinot timestamps into
Vecusing
deserializetimestampsfromjson()`.
fn deserializetimestamps<'de, D>(deserializer: D) -> std::result::ResultDateTime<Utc>
using deserialize_timestamp_from_json()
.
pub fn deserialize_timestamp<'de, D>(deserializer: D) -> std::result::ResultVec<Vec<u8>>
using deserialize_bytes_array_from_json()
.
pub fn deserializebytesarray<'de, D>(deserializer: D) -> std::result::ResultVec<u8>
using deserialize_bytes_from_json()
.
pub fn deserialize_bytes<'de, D>(deserializer: D) -> std::result::Resultdeserialize_json_from_json()
.
pub fn deserialize_json<'de, D>(deserializer: D) -> std::result::ResultDataRow
is defined as:rust
/// A row of
Data`[derive(Clone, Debug, PartialEq)]
Data
is defined as:[derive(Clone, Debug, PartialEq)]
Data
, like:
fn data_type(&self) -> DataType;
fn get_int(&self) -> Result<i32>;
fn get_long(&self) -> Result<i64>;
fn get_float(&self) -> Result<f32>;
fn get_double(&self) -> Result<f64>;
fn get_boolean(&self) -> Result<bool>;
fn get_timestamp(&self) -> Result<DateTime<Utc>>;
fn get_string(&self) -> Result<&str>;
fn get_json(&self) -> Result<&Value>;
fn get_bytes(&self) -> Result<&Vec<u8>>;
fn is_null(&self) -> bool;
DataRow
also contains convenience counterparts to those above given a column index.