Productive and safe Rust bindings/wrappers for Tesseract and Leptonica.
Make sure you have clang, Leptonica and Tesseract installed.
Tesseract should be version 4.0.0 or above.
bash
sudo apt-get install libleptonica-dev libtesseract-dev clang
You will also need to install tesseract language data based on your OCR needs:
bash
sudo apt-get install tesseract-ocr-eng
bash
brew install tesseract leptonica
On Windows, this library uses Microsoft's vcpkg to provide tesseract.
Please install vcpkg and set up user wide integration or vcpkg crate won't be able to find the library.
To install tesseract:
```cmd REM from the vcpkg directory
REM 32 bit .\vcpkg install tesseract:x86-windows
REM 64 bit .\vcpkg install tesseract:x64-windows ```
To run the tests configure vcpkg-crate to find the tesseract library:
cmd
SET VCPKGRS_DYNAMIC=true
cargo test
rust
let mut lt = leptess::LepTess::new(None, "eng").unwrap();
lt.set_image("path/to/page.bmp");
println!("{}", lt.get_utf8_text().unwrap());
For more examples, see docs and examples
directory.
To run demos in examples
directory, try:
bash
cargo run --example low_level_ocr_full_page
To run tests, you will need at Tesseract 4.x to match what we have in
tests/tessdata/eng.traineddata
. See CircleCI config to see how to replicate
the setup.