DataFusion

logo

DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format.

DataFusion offers SQL and Dataframe APIs, excellent performance, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community.

Coverage Status

Features

Use Cases

DataFusion can be used without modification as an embedded SQL engine or can be customized and used as a foundation for building new systems. Here are some examples of systems built using DataFusion:

By using DataFusion, the projects are freed to focus on their specific features, and avoid reimplementing general (but still necessary) features such as an expression representation, standard optimizations, execution plans, file format support, etc.

Why DataFusion?

Comparisons with other projects

When compared to similar systems, DataFusion typically is:

  1. Targeted at developers, rather than end users / data scientists.
  2. Designed to be embedded, rather than a complete file based SQL system.
  3. Governed by the Apache Software Foundation process, rather than a single company or individual.
  4. Implemented in Rust, rather than C/C++

Here is a comparison with similar projects that may help understand when DataFusion might be be suitable and unsuitable for your needs:

DataFusion Community Extensions

There are a number of community projects that extend DataFusion or provide integrations with other systems.

Language Bindings

Integrations

Known Uses

Here are some of the projects known to use DataFusion:

Examples

Please see the example usage in the user guide and the datafusion-examples crate for more information on how to use DataFusion.

Roadmap

Please see Roadmap for information of where the project is headed.

Architecture Overview

There is no formal document describing DataFusion's architecture yet, but the following presentations offer a good overview of its different components and how they interact together.

User Guide

Please see User Guide for more information about DataFusion.

Contributor Guide

Please see Contributor Guide for information about contributing to DataFusion.