urlnorm

Build Status docs.rs crates.io

URL normalization library for Rust, mainly designed to normalize URLs for https://progscrape.com.

The normalization algorithm uses the following heuristics:

Usage

For long-term storage and clustering of URLs, it is recommended that [UrlNormalizer::compute_normalization_string] is used to compute a representation of the URL that can be compared with standard string comparison operators.

```

use url::Url;

use urlnorm::UrlNormalizer;

let norm = UrlNormalizer::default(); asserteq!(norm.computenormalization_string(&Url::parse("http://www.google.com").unwrap()), "google.com:"); ```

For more advanced use cases, the [Options] class allows end-users to provide custom regular expressions for normalization.