URL normalization library for Rust, mainly designed to normalize URLs for https://progscrape.com.
The normalization algorithm uses the following heuristics:
http://example.com
and https://example.com
are considered equivalent.www.
and m.
.http://example.com//foo/
and http://example.com/foo
are considered equlivalent.utm_XYZ
and the like)./#/
and #!
)For long-term storage and clustering of URLs, it is recommended that [UrlNormalizer::compute_normalization_string
] is used to
compute a representation of the URL that can be compared with standard string comparison operators.
```
let norm = UrlNormalizer::default(); asserteq!(norm.computenormalization_string(&Url::parse("http://www.google.com").unwrap()), "google.com:"); ```
For more advanced use cases, the [Options
] class allows end-users to provide custom regular expressions for normalization.