xhtmlchardet

Basic character set detection for XML and HTML in Rust.

Build Status Documentation Latest Version

Minimum Supported Rust Version: 1.24.0

Example

```rust use std::io::Cursor; extern crate xhtmlchardet;

let text = b"Example"; let mut textcursor = Cursor::new(text.tovec()); let detectedcharsets: Vec = xhtmlchardet::detect(&mut textcursor, None).unwrap(); asserteq!(detectedcharsets, vec!["iso-8859-1".to_string()]); ```

Rationale

I wrote a feed crawler that needed to determine the character set of fetched content so that it could be normalised to UTF-8. Initially I used the [uchardet] crate but I encountered some situations where it misdetected the charset. I collected all these edge cases together and built a test suite. Then I implemented this crate, which passes all of those tests. It uses a fairly naïve approach derived from section F of the XML specification.