rustrict
is a sophisticated profanity filter for Rust.
&str
or Iterator<Type = char>
customize
featureregex
(uses custom radix trie)release
modedebug
mode&str
)```rust use rustrict::CensorStr;
let censored: String = "hello crap".censor(); let inappropriate: bool = "f u c k".is_inappropriate();
assert_eq!(censored, "hello c*"); assert!(inappropriate); ```
Iterator<Type = char>
)```rust use rustrict::CensorIter;
let censored: String = "hello crap".chars().censor().collect();
assert_eq!(censored, "hello c*"); ```
By constructing a Censor
, one can avoid scanning text multiple times to get a censored String
and/or
answer multiple is
queries. This also opens up more customization options (defaults are below).
```rust use rustrict::{Censor, Type};
let (censored, analysis) = Censor::fromstr("123 Crap") .withcensorthreshold(Type::INAPPROPRIATE) .withcensorfirstcharacterthreshold(Type::OFFENSIVE & Type::SEVERE) .withignorefalsepositives(false) .withignoreselfcensoring(false) .withcensorreplacement('*') .censorand_analyze();
assert_eq!(censored, "123 C*"); assert!(analysis.is(Type::INAPPROPRIATE)); assert!(analysis.isnt(Type::PROFANE & Type::SEVERE | Type::SEXUAL)); ```
If you cannot afford to let anything slip though, or have reason to believe a particular user is trying to evade the filter, you can check if their input matches a short list of safe strings:
```rust use rustrict::{CensorStr, Type};
// Figure out if a user is trying to evade the filter. assert!("pron".is(Type::EVASIVE)); assert!("porn".isnt(Type::EVASIVE));
// Only let safe messages through. assert!("Hello there!".is(Type::SAFE)); assert!("nice work.".is(Type::SAFE)); assert!("yes".is(Type::SAFE)); assert!("NVM".is(Type::SAFE)); assert!("gtg".is(Type::SAFE)); assert!("not a common phrase".isnt(Type::SAFE)); ```
If you want to add custom profanities, safe words, or characters to strip out, enable the "customize" feature.
```rust
{ use rustrict::{addword, bancharacter, CensorStr, Type};
// You must take care not to call these when the crate is being
// used in any other way (to avoid concurrent mutation).
unsafe {
add_word("reallyreallybadword", (Type::PROFANE & Type::SEVERE) | Type::MEAN);
add_word("mybrandname", Type::SAFE);
ban_character('ꙮ');
}
assert!("Reallllllyreallllllybaaaadword".is(Type::PROFANE));
assert!("MyBrandName".is(Type::SAFE));
assert_eq!("helloꙮꙮꙮ".censor(), "hello");
} ```
To compare filters, the first 100,000 items of this list is used as a dataset. Positive accuracy is the percentage of profanity detected as profanity. Negative accuracy is the percentage of clean text detected as clean.
| Crate | Accuracy | Positive Accuracy | Negative Accuracy | Time | |-------|----------|-------------------|-------------------|------| | rustrict | 90.86% | 91.54% | 90.69% | 8s | | censor | 76.16% | 72.76% | 77.01% | 23s |
If you make an adjustment that would affect false positives, you will need to run false_positive_finder
:
1. Run ./download.sh
to get the required word lists.
2. Run cargo run --bin false_positive_finder --release --all-features
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.