encoding_rs

Build Status crates.io docs.rs Apache 2 / MIT dual-licensed

encoding_rs an implementation of the (non-JavaScript parts of) the Encoding Standard written in Rust and used in Gecko (starting with Firefox 56).

Functionality

Due to the Gecko use case, encoding_rs supports decoding to and encoding from UTF-16 in addition to supporting the usual Rust use case of decoding to and encoding from UTF-8. Additionally, the API has been designed to be FFI-friendly to accommodate the C++ side of Gecko.

Specifically, encoding_rs does the following:

Licensing

Please see the file named COPYRIGHT.

API Documentation

Generated API documentation is available online.

C and C++ bindings

An FFI layer for encoding_rs is available as a separate crate. The crate comes with a demo C++ wrapper using the C++ standard library and GSL types.

For the Gecko context, there's a C++ wrapper using the MFBT/XPCOM types.

Sample programs

Optional features

There are currently three optional cargo features:

simd-accel

Enables SSE2 acceleration on x86 and x8664 and NEON acceleration on Aarch64. Requires nightly Rust. _Enabling this cargo feature is recommended when building for x86, x8664 or Aarch64 on nightly Rust._ The intention is for the functionality enabled by this feature to become the normal on-by-default behavior once explicit SIMD becames available on all Rust release channels.

Enabling this feature breaks the build unless the target is x86 with SSE2 (Rust's default 32-bit x86 target, i686, has SSE2, but Linux distros may use an x86 target without SSE2, i.e. i586 in rustup terms), x86_64 or Aarch64.

serde

Enables support for serializing and deserializing &'static Encoding-typed struct fields using Serde.

no-static-ideograph-encoder-tables

Makes the binary size smaller at the expense of ideograph encode speed for Chinese and Japanese legacy encodings. (Does not affect decode speed.)

The speed resulting from enabling this feature is believed to be acceptable for Web browser-exposed encoder use cases. However, the result is likely unacceptable for other applications that need to produce output in Chinese or Japanese legacy encodings. (But applications really should always be using UTF-8 for output.)

Performance goals

For decoding to UTF-16, the goal is to perform at least as well as Gecko's old uconv. For decoding to UTF-8, the goal is to perform at least as well as rust-encoding.

Encoding to UTF-8 should be fast. (UTF-8 to UTF-8 encode should be equivalent to memcpy and UTF-16 to UTF-8 should be fast.)

Speed is a non-goal when encoding to legacy encodings. Encoding to legacy encodings should not be optimized for speed at the expense of code size as long as form submission and URL parsing in Gecko don't become noticeably too slow in real-world use.

Currently, by default, encoding_rs builds with limited encoder-specific accelation tables for GB2312 Level 1 Hanzi, Big5 Level 1 Hanzi and JIS X 0208 Level 1 Kanji. These tables use binary search and strike a balance between not having encoder-specific tables at all (doing linear search over the decode-optimized tables) and having larger directly-indexable encoder-side tables. It is not clear that anyone wants this in-between approach, and it may be changed in the future.

In the interest of binary size, Firefox builds with the no-static-ideograph-encoder-tables cargo feature, which omits the encoder-specific tables and performs linear search over the decode-optimized tables. With realistic work loads, this seemed fast enough not to be user-visibly slow on Raspberry Pi 3 (which stood in for a phone for testing) in the Web-exposed encoder use cases.

A framework for measuring performance is available separately.

Rust Version Compatibility

It is a goal to support the latest stable Rust, the latest nightly Rust and the version of Rust that's used for Firefox Nightly (currently 1.19.0). These are tested on Travis.

Additionally, beta and the oldest known to work Rust version (currently 1.15.0) are tested on Travis. The oldest Rust known to work is tested as a canary so that when the oldest known to work no longer works, the change can be documented here. At this time, there is no firm commitment to support a version older than what's required by Firefox, but there isn't an active plan to make changes that would make 1.15.0 no longer work, either.

Compatibility with rust-encoding

A compatibility layer that implements the rust-encoding API on top of encoding_rs is provided as a separate crate (cannot be uploaded to crates.io). The compatibility layer was originally written with the assuption that Firefox would need it, but it is not currently used in Firefox.

Roadmap

Release Notes

0.7.1

0.7.0

0.6.11

0.6.10

0.6.9

0.6.8

0.6.7

0.6.6

0.6.5

0.6.4

0.6.3

0.6.2

0.6.1

0.6.0

0.5.1

0.5.0

0.4.0

0.3.2

0.3.1

0.3

0.2.4

0.2.3

0.2.2

0.2.1

0.2.0

The initial release.