Attempts to detect the character encoding of raw text using the uchardet
library.
To add it to your project, add the following lines to your Cargo.toml
file:
[dependencies.uchardet]
git = "git://github.com/emk/rust-uchardet"
To run it:
```rust // At the top of the file. extern crate uchardet; use uchardet::EncodingDetector;
// Inside a function. asserteq!(Some("UTF-8".tostring()), EncodingDetector::detect("français".as_bytes()).unwrap()); ```
API documentation is available.
Are you looking for a Rust wrapper for cld2 for detecting languages? I'm currently working on one and hope to publish it shortly.
If you wish, you may install uchardet
using your system package manager.
For example, under Ubuntu, you can run:
sh
sudo apt-get install libuchardet-dev
If you skip this step, Cargo will attempt to compile uchardet
from the
bundled source code instead. This will probably only work on Linux
machines with CMake involved, but pull requests to improve this are
welcomed eagerly.
New code in the rust-uchardet
library is released into the public domain,
as described in the UNLICENSE
file. However, several pre-existing pieces
have their own licenses:
uchardet
C++ library in uchardet-sys/uchardet
is
distributed under the Mozilla Public License 1.1.uchardet-sys/src/build.rs
contains several short snippets of
code based on Alex Crichton's git2-rs library, which is described as
being licenses "under the terms of both the MIT license and the Apache
License (Version 2.0), with portions covered by various BSD-like
licenses." However, this file is only run at build time, not linked into
the resulting executable.