safetensors

Safetensors

This repository implements a new simple format for storing tensors safely (as opposed to pickle) and that is still fast (zero-copy).

Installation

Pip

You can install safetensors via the pip manager:

bash pip install safetensors

From source

For the sources, you need Rust

```bash

Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Make sure it's up to date and using stable channel

rustup update git clone https://github.com/huggingface/safetensors cd safetensors/bindings/python pip install setuptools_rust pip install -e . ```

Getting started

```python from safetensors import safeopen from safetensors.torch import savefile

tensors = { "weight1": torch.zeros((1024, 1024)), "weight2": torch.zeros((1024, 1024)) } save_file(tensors, "model.safetensors")

tensors = {} with safeopen("model.safetensors", framework="pt", device="cpu") as f: for key in f.keys(): tensors[key] = f.gettensor(key) ```

Python documentation

Format

Yet another format ?

The main rationale for this crate is to remove the need to use pickle on PyTorch which is used by default. There are other formats out there used by machine learning and more general formats.

Let's take a look at alternatives and why this format is deemed interesting. This is my very personal and probably biased view:

| Format | Safe | Zero-copy | Lazy loading | No file size limit | Layout control | Flexibility | Bfloat16 | ----------------------- | --- | --- | --- | --- | --- | --- | --- | | pickle (PyTorch) | ✗ | ✗ | ✗ | 🗸 | ✗ | 🗸 | 🗸 | | H5 (Tensorflow) | 🗸 | ✗ | 🗸 | 🗸 | ~ | ~ | ✗ | | SavedModel (Tensorflow) | 🗸 | ✗ | ✗ | 🗸 | 🗸 | ✗ | 🗸 | | MsgPack (flax) | 🗸 | 🗸 | ✗ | 🗸 | ✗ | ✗ | 🗸 | | Protobuf (ONNX) | 🗸 | ✗ | ✗ | ✗ | ✗ | ✗ | 🗸 | | Cap'n'Proto | 🗸 | 🗸 | ~ | 🗸 | 🗸 | ~ | ✗ | | Arrow | ? | ? | ? | ? | ? | ? | ✗ | | Numpy (npy,npz) | 🗸 | ? | ? | ✗ | 🗸 | ✗ | ✗ | | SafeTensors | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | ✗ | 🗸 |

Main oppositions

Notes

Benefits

Since we can invent a new format we can propose additional benefits: