content_inspector

Crates.io Documentation

A simple library for fast inspection of binary buffers to guess/determine the type of content.

This is mainly intended to quickly determine whether a given buffer contains "binary" or "text" data. The analysis is based on a very simple heuristic: Detection of special byte order marks and searching for NULL bytes. Note that this analysis can fail. For example, even if unlikely, UTF-8-encoded text can legally contain NULL bytes. Also, for performance reasons, only the first 1024 bytes are checked for the NULL-byte (if no BOM) is detected.

Usage

```rust use content_inspector::{ContentType, inspect};

asserteq!(ContentType::UTF8, inspect(b"Hello")); assert_eq!(ContentType::BINARY, inspect(b"\xFF\xE0\x00\x10\x4A\x46\x49\x46\x00"));

assert!(inspect(b"Hello").is_text()); ```

CLI example

This crate also comes with a small example command-line program (see examples/inspect.rs) that demonstrates the usage: ```bash

inspect USAGE: inspect FILE [FILE...]

inspect testdata/* testdata/createtextfiles.py: UTF-8 testdata/filesources.md: UTF-8 testdata/test.jpg: binary testdata/test.pdf: binary testdata/test.png: binary testdata/textUTF-16BE-BOM.txt: UTF-16BE testdata/textUTF-16LE-BOM.txt: UTF-16LE testdata/textUTF-32BE-BOM.txt: UTF-32BE testdata/textUTF-32LE-BOM.txt: UTF-32LE testdata/textUTF-8-BOM.txt: UTF-8-BOM testdata/text_UTF-8.txt: UTF-8 ```

If you only want to detect whether something is a binary or text file, this is about a factor of 250 faster than file --mime ....