A simple library for fast inspection of binary buffers to guess/determine the type of content.
This is mainly intended to quickly determine whether a given buffer contains "binary" or "text" data. The analysis is based on a very simple heuristic: Detection of special byte order marks and searching for NULL bytes. Note that this analysis can fail. For example, even if unlikely, UTF-8-encoded text can legally contain NULL bytes. Also, for performance reasons, only the first 1024 bytes are checked for the NULL-byte (if no BOM) is detected.
```rust use content_inspector::{ContentType, inspect};
asserteq!(ContentType::UTF8, inspect(b"Hello")); assert_eq!(ContentType::BINARY, inspect(b"\xFF\xE0\x00\x10\x4A\x46\x49\x46\x00"));
assert!(inspect(b"Hello").is_text()); ```
This crate also comes with a small example command-line program (see examples/inspect.rs
) that demonstrates the usage:
```bash
inspect USAGE: inspect FILE [FILE...]
inspect testdata/* testdata/createtextfiles.py: UTF-8 testdata/filesources.md: UTF-8 testdata/test.jpg: binary testdata/test.pdf: binary testdata/test.png: binary testdata/textUTF-16BE-BOM.txt: UTF-16BE testdata/textUTF-16LE-BOM.txt: UTF-16LE testdata/textUTF-32BE-BOM.txt: UTF-32BE testdata/textUTF-32LE-BOM.txt: UTF-32LE testdata/textUTF-8-BOM.txt: UTF-8-BOM testdata/text_UTF-8.txt: UTF-8 ```
If you only want to detect whether something is a binary or text file, this is about a factor of 250 faster than file --mime ...
.