This is a language detection library, aiming for both precision and performance.
It uses a multiclass logistic regression model over: - 2, 3, 4-grams of letters on ASCII - codepoint / 128 - a slightly smarter projection of codepoints over a given class.
We use the hashing trick and project these features over a space of size 4_096
.
The logistic regression is trained in the python notebook attached,
and used to generate weight.rs
.