Encoding 0.2.27

Encoding on Travis CI

Character encoding support for Rust. (also known as rust-encoding) It is based on WHATWG Encoding Standard, and also provides an advanced interface for error detection and recovery.

Complete Documentation

Simple Usage

To encode a string:

~~~~ {.rust} use encoding::{Encoding, EncoderTrap}; use encoding::all::ISO88591;

asserteq!(ISO8859_1.encode("caf\u{e9}", EncoderTrap::Strict), Ok(vec![99,97,102,233])); ~~~~

To encode a string with unrepresentable characters:

~~~~ {.rust} use encoding::{Encoding, EncoderTrap}; use encoding::all::ISO88592;

assert!(ISO88592.encode("Acme\u{a9}", EncoderTrap::Strict).iserr()); asserteq!(ISO88592.encode("Acme\u{a9}", EncoderTrap::Replace), Ok(vec![65,99,109,101,63])); asserteq!(ISO88592.encode("Acme\u{a9}", EncoderTrap::Ignore), Ok(vec![65,99,109,101])); asserteq!(ISO88592.encode("Acme\u{a9}", EncoderTrap::NcrEscape), Ok(vec![65,99,109,101,38,35,49,54,57,59])); ~~~~

To decode a byte sequence:

~~~~ {.rust} use encoding::{Encoding, DecoderTrap}; use encoding::all::ISO88591;

asserteq!(ISO88591.decode(&[99,97,102,233], DecoderTrap::Strict), Ok("caf\u{e9}".tostring())); ~~~~

To decode a byte sequence with invalid sequences:

~~~~ {.rust} use encoding::{Encoding, DecoderTrap}; use encoding::all::ISO88596;

assert!(ISO88596.decode(&[65,99,109,101,169], DecoderTrap::Strict).iserr()); asserteq!(ISO88596.decode(&[65,99,109,101,169], DecoderTrap::Replace), Ok("Acme\u{fffd}".tostring())); asserteq!(ISO88596.decode(&[65,99,109,101,169], DecoderTrap::Ignore), Ok("Acme".to_string())); ~~~~

A practical example of custom encoder traps:

~~~~ {.rust} use encoding::{Encoding, ByteWriter, EncoderTrap, DecoderTrap}; use encoding::types::RawEncoder; use encoding::all::ASCII;

// hexadecimal numeric character reference replacement fn hexncrescape(encoder: &mut RawEncoder, input: &str, output: &mut ByteWriter) -> bool { let escapes: Vec = input.chars().map(|ch| format!("&#x{:x};", ch as isize)).collect(); let escapes = escapes.concat(); output.writebytes(escapes.asbytes()); true } static HEXNCRESCAPE: EncoderTrap = EncoderTrap::Call(hexncr_escape);

let orig = "Hello, 世界!".tostring(); let encoded = ASCII.encode(&orig, HEXNCRESCAPE).unwrap(); asserteq!(ASCII.decode(&encoded, DecoderTrap::Strict), Ok("Hello, 世界!".to_string())); ~~~~

Getting the encoding from the string label, as specified in WHATWG Encoding standard:

~~~~ {.rust} use encoding::{Encoding, DecoderTrap}; use encoding::label::encodingfromwhatwglabel; use encoding::all::WINDOWS949;

let euckr = encodingfromwhatwglabel("euc-kr").unwrap(); asserteq!(euckr.name(), "windows-949"); asserteq!(euckr.whatwgname(), Some("euc-kr")); // for the sake of compatibility let broken = &[0xbf, 0xec, 0xbf, 0xcd, 0xff, 0xbe, 0xd3]; asserteq!(euckr.decode(broken, DecoderTrap::Replace), Ok("\u{c6b0}\u{c640}\u{fffd}\u{c559}".tostring()));

// corresponding Encoding native API: asserteq!(WINDOWS949.decode(broken, DecoderTrap::Replace), Ok("\u{c6b0}\u{c640}\u{fffd}\u{c559}".to_string())); ~~~~

Detailed Usage

There are three main entry points to Encoding.

Encoding is a single character encoding. It contains encode and decode methods for converting String to Vec<u8> and vice versa. For the error handling, they receive traps (EncoderTrap and DecoderTrap respectively) which replace any error with some string (e.g. U+FFFD) or sequence (e.g. ?). You can also use EncoderTrap::Strict and DecoderTrap::Strict traps to stop on an error.

There are two ways to get Encoding:

RawEncoder is an experimental incremental encoder. At each step of raw_feed, it receives a slice of string and emits any encoded bytes to a generic ByteWriter (normally Vec<u8>). It will stop at the first error if any, and would return a CodecError struct in that case. The caller is responsible for calling raw_finish at the end of encoding process.

RawDecoder is an experimental incremental decoder. At each step of raw_feed, it receives a slice of byte sequence and emits any decoded characters to a generic StringWriter (normally String). Otherwise it is identical to RawEncoders.

One should prefer Encoding::{encode,decode} as a primary interface. RawEncoder and RawDecoder is experimental and can change substantially. See the additional documents on encoding::types module for more information on them.

Supported Encodings

Encoding covers all encodings specified by WHATWG Encoding Standard and some more:

Parenthesized names refer to the encoding's primary name assigned by WHATWG Encoding Standard.

Many legacy character encodings lack the proper specification, and even those that have a specification are highly dependent of the actual implementation. Consequently one should be careful when picking a desired character encoding. The only standards reliable in this regard are WHATWG Encoding Standard and vendor-provided mappings from the Unicode consortium. Whenever in doubt, look at the source code and specifications for detailed explanations.