Python Pypi Documentation Codecov Downloads

Rust Crates.io Documentation Codecov Dependency status

safetensors

Safetensors

This repository implements a new simple format for storing tensors safely (as opposed to pickle) and that is still fast (zero-copy).

Installation

Pip

You can install safetensors via the pip manager:

bash pip install safetensors

From source

For the sources, you need Rust

```bash

Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Make sure it's up to date and using stable channel

rustup update git clone https://github.com/huggingface/safetensors cd safetensors/bindings/python pip install setuptools_rust pip install -e . ```

Getting started

```python import torch from safetensors import safeopen from safetensors.torch import savefile

tensors = { "weight1": torch.zeros((1024, 1024)), "weight2": torch.zeros((1024, 1024)) } save_file(tensors, "model.safetensors")

tensors = {} with safeopen("model.safetensors", framework="pt", device="cpu") as f: for key in f.keys(): tensors[key] = f.gettensor(key) ```

Python documentation

Format

Notes: - Duplicate keys are disallowed. Not all parsers may respect this. - In general the subset of JSON is implicitly decided by serde_json for this library. Anything obscure might be modified at a later time, that odd ways to represent integer, newlines and escapes in utf-8 strings. This would only be done for safety concerns - Tensor values are not checked against, in particular NaN and +/-Inf could be in the file - Empty tensors (tensors with 1 dimension being 0) are allowed. They are not storing any data in the databuffer, yet retaining size in the header. They don't really bring a lot of values but are accepted since they are valid tensors from traditional tensor libraries perspective (torch, tensorflow, numpy, ..). - 0-rank Tensors (tensors with shape []) are allowed, they are merely a scalar.

Yet another format ?

The main rationale for this crate is to remove the need to use pickle on PyTorch which is used by default. There are other formats out there used by machine learning and more general formats.

Let's take a look at alternatives and why this format is deemed interesting. This is my very personal and probably biased view:

| Format | Safe | Zero-copy | Lazy loading | No file size limit | Layout control | Flexibility | Bfloat16 | ----------------------- | --- | --- | --- | --- | --- | --- | --- | | pickle (PyTorch) | ✗ | ✗ | ✗ | 🗸 | ✗ | 🗸 | 🗸 | | H5 (Tensorflow) | 🗸 | ✗ | 🗸 | 🗸 | ~ | ~ | ✗ | | SavedModel (Tensorflow) | 🗸 | ✗ | ✗ | 🗸 | 🗸 | ✗ | 🗸 | | MsgPack (flax) | 🗸 | 🗸 | ✗ | 🗸 | ✗ | ✗ | 🗸 | | Protobuf (ONNX) | 🗸 | ✗ | ✗ | ✗ | ✗ | ✗ | 🗸 | | Cap'n'Proto | 🗸 | 🗸 | ~ | 🗸 | 🗸 | ~ | ✗ | | Arrow | ? | ? | ? | ? | ? | ? | ✗ | | Numpy (npy,npz) | 🗸 | ? | ? | ✗ | 🗸 | ✗ | ✗ | | pdparams (Paddle) | ✗ | ✗ | ✗ | 🗸 | ✗ | 🗸 | 🗸 | | SafeTensors | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | ✗ | 🗸 |

Main oppositions

Notes

Benefits

Since we can invent a new format we can propose additional benefits:

License: Apache-2.0