Python Documentation | Rust Documentation | User Guide | Discord | StackOverflow
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as the memory model.
To learn more, read the User Guide.
```python
import polars as pl df = pl.DataFrame( ... { ... "A": [1, 2, 3, 4, 5], ... "fruits": ["banana", "banana", "apple", "apple", "banana"], ... "B": [5, 4, 3, 2, 1], ... "cars": ["beetle", "audi", "beetle", "beetle", "beetle"], ... } ... )
( ... df ... .sort("fruits") ... .select( ... [ ... "fruits", ... "cars", ... pl.lit("fruits").alias("literalstringfruits"), ... pl.col("B").filter(pl.col("cars") == "beetle").sum(), ... pl.col("A").filter(pl.col("B") > 2).sum().over("cars").alias("sumAbycars"), # groups by "cars" ... pl.col("A").sum().over("fruits").alias("sumAbyfruits"), # groups by "fruits" ... pl.col("A").reverse().over("fruits").alias("revAbyfruits"), # groups by "fruits ... pl.col("A").sortby("B").over("fruits").alias("sortAbyBbyfruits"), # groups by "fruits" ... ] ... ) ... ) shape: (5, 8) ┌──────────┬──────────┬──────────────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐ │ fruits ┆ cars ┆ literalstri ┆ B ┆ sumAbyca ┆ sumAbyfr ┆ revAbyfr ┆ sortAbyB │ │ --- ┆ --- ┆ ngfruits ┆ --- ┆ rs ┆ uits ┆ uits ┆ _byfruits │ │ str ┆ str ┆ --- ┆ i64 ┆ --- ┆ --- ┆ --- ┆ --- │ │ ┆ ┆ str ┆ ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞══════════╪══════════╪══════════════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡ │ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 4 ┆ 4 │ ├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 3 ┆ 3 │ ├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 5 ┆ 5 │ ├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ "banana" ┆ "audi" ┆ "fruits" ┆ 11 ┆ 2 ┆ 8 ┆ 2 ┆ 2 │ ├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 1 ┆ 1 │ └──────────┴──────────┴──────────────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘
```
Polars is very fast. In fact, it is one of the best performing solutions available. See the results in h2oai's db-benchmark.
Install the latest polars version with:
$ pip3 install -U 'polars[pyarrow]'
Releases happen quite often (weekly / every few days) at the moment, so updating polars regularly to get the latest bugfixes / features might not be a bad idea.
You can take latest release from crates.io
, or if you want to use the latest features / performance improvements
point to the master
branch of this repo.
toml
polars = { git = "https://github.com/pola-rs/polars", rev = "<optional git tag>" }
Required Rust version >=1.58
Want to know about all the features Polars supports? Read the docs!
$ pip3 install polars
$ yarn add nodejs-polars
Want to contribute? Read our contribution guideline.
If you want a bleeding edge release or maximal performance you should compile polars from source.
This can be done by going through the following steps in sequence:
$ pip3 install maturin
bash
$ cd py-polars && maturin develop --release -- -C target-cpu=native
bash
$ cd py-polars && maturin develop --release -- -C codegen-units=16 -C lto=thin -C target-cpu=native
Note that the Rust crate implementing the Python bindings is called py-polars
to distinguish from the wrapped
Rust crate polars
itself. However, both the Python package and the Python module are named polars
, so you
can pip install polars
and import polars
.
Polars has transitioned to arrow2. Arrow2 is a faster and safer implementation of the Apache Arrow Columnar Format. Arrow2 also has a more granular code base, helping to reduce the compiler bloat.
See this example.
Do you expect more than 2^32
~4,2 billion rows? Compile polars with the bigidx
feature flag.
Or for python users install $ pip install -U polars-u64-idx
.
Don't use this unless you hit the row boundary as the default polars is faster and consumes less memory.
Development of Polars is proudly powered by