hyperpolyglot

A fast programming language detector.

Hyperpolyglot is a fast programming language detector written in Rust based on Github's Linguist Ruby library. Hyperpolyglot supports detecting the programming language of a file or detecting the programming language makeup of a directory. For more details on how the language detection is done, see the Linguist README.

CLI

Installing cargo install hyperpolyglot

Usage hyply [PATH]

Library

Adding as a dependency TOML [dependencies] hyperpolyglot = "0.1.0"

Detect ```Rust use hyperpolyglot;

let detection = hyperpolyglot::detect(Path::new("src/bin/main.rs")); assert_eq!(Ok(Some(Detection::Heuristics("Rust"))), detection); ```

Breakdown ```Rust use hyperpolyglot::{getlanguagebreakdown};

let breakdown: HashMap<&'static str, Vec<(Detection, PathBuf)>> = getlanguagebreakdown("src/"); println!("{:?}", breakdown.get("Rust")); ```

Divergences from Linguist

Benchmarks

samples dir |Tool |mean (ms)|median (ms)|min (ms)|max (ms)| |-------------------------------|---------|-----------|--------|--------| |hyperpolyglot (multi-threaded) |1,188 |1,186 |1,166 |1,226 | |hyperpolyglot (single-threaded)|2,424 |2,424 |2,414 |2,442 | |enry |21,619 |21,566 |21,514 |21,855 | |Linguist |42,407 |42,386 |42,070 |42,856 |

Rust Repo |Tool |mean (ms)|median (ms)|min (ms)|max (ms)| |-------------------------------|---------|-----------|--------|--------| |hyperpolyglot (multi-threaded) |3,808 |3,751 |3,708 |4,253 | |hyperpolyglot (single-threaded)|8,341 |8,334 |8,276 |8,437 | |enry |82,300 |82,215 |82,021 |82,817 | |Linguist |196,780 |197,300 |194,033 |202,930 |

Linux Kernel * The reason hyperpolyglot is so much faster here is the heuristic added to .h files which significantly speeds up detection for .h files that can't be classified with the Objective-C or C++ heuristics

|Tool |mean (s)|median (s)|min (s) |max (s) | |-------------------------------|---------|---------|------- |------- | |hyperpolyglot (multi-threaded) |3.7574 |3.7357 |3.7227 |3.9021 | |hyperpolyglot (single-threaded)|7.5833 |7.5683 |7.5445 |7.6489 | |enry |137.6046 |137.4229 |137.1955|138.8694|

Accuracy

All of the programming language detectors are far from perfect and hyperpolyglot is no exception. It's language detections mirror Linguist and enry for most files with the biggest divergences coming from files that need to fall back on the classifier. Files that can be detected through a common known filename, an extension, or by following the set of heuristics should approach 100% accuracy.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.