rustrict

rustrict is a sophisticated profanity filter for Rust.

Features

Limitations

Usage

Strings (&str)

```rust use rustrict::CensorStr;

let censored: String = "hello crap".censor(); let inappropriate: bool = "f u c k".is_inappropriate();

assert_eq!(censored, "hello c*"); assert!(inappropriate); ```

Iterators (Iterator<Type = char>)

```rust use rustrict::CensorIter;

let censored: String = "hello crap".chars().censor().collect();

assert_eq!(censored, "hello c*"); ```

Advanced

By constructing a Censor, one can avoid scanning text multiple times to get a censored String and/or answer multiple is queries. This also opens up more customization options (defaults are below).

```rust use rustrict::{Censor, Type};

let (censored, analysis) = Censor::fromstr("123 Crap") .withcensorthreshold(Type::INAPPROPRIATE) .withcensorfirstcharacterthreshold(Type::OFFENSIVE & Type::SEVERE) .withignorefalsepositives(false) .withignoreselfcensoring(false) .withcensorreplacement('*') .censorand_analyze();

assert_eq!(censored, "123 C*"); assert!(analysis.is(Type::INAPPROPRIATE)); assert!(analysis.isnt(Type::PROFANE & Type::SEVERE | Type::SEXUAL)); ```

If you cannot afford to let anything slip though, or have reason to believe a particular user is trying to evade the filter, you can check if their input matches a short list of safe strings:

```rust use rustrict::{CensorStr, Type};

// Figure out if a user is trying to evade the filter. assert!("pron".is(Type::EVASIVE)); assert!("porn".isnt(Type::EVASIVE));

// Only let safe messages through. assert!("Hello there!".is(Type::SAFE)); assert!("nice work.".is(Type::SAFE)); assert!("yes".is(Type::SAFE)); assert!("NVM".is(Type::SAFE)); assert!("gtg".is(Type::SAFE)); assert!("not a common phrase".isnt(Type::SAFE)); ```

If you want to add custom profanities, safe words, or characters to strip out, enable the "customize" feature.

```rust

[cfg(feature = "customize")]

{ use rustrict::{addword, bancharacter, CensorStr, Type};

// You must take care not to call these when the crate is being
// used in any other way (to avoid concurrent mutation).
unsafe {
    add_word("reallyreallybadword", (Type::PROFANE & Type::SEVERE) | Type::MEAN);
    add_word("mybrandname", Type::SAFE);
    ban_character('ꙮ');
}

assert!("Reallllllyreallllllybaaaadword".is(Type::PROFANE));
assert!("MyBrandName".is(Type::SAFE));
assert_eq!("helloꙮꙮꙮ".censor(), "hello");

} ```

Comparison

To compare filters, the first 100,000 items of this list is used as a dataset. Positive accuracy is the percentage of profanity detected as profanity. Negative accuracy is the percentage of clean text detected as clean.

| Crate | Accuracy | Positive Accuracy | Negative Accuracy | Time | |-------|----------|-------------------|-------------------|------| | rustrict | 90.86% | 91.54% | 90.69% | 8s | | censor | 76.16% | 72.76% | 77.01% | 23s |

Development

If you make an adjustment that would affect false positives, you will need to run false_positive_finder: 1. Run ./download.sh to get the required word lists. 2. Run cargo run --bin false_positive_finder --release --all-features

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.