Create dictionaries by scraping webpages.
Similar tools (some features inspired by them): - CeWL - CeWLeR
```bash
nix build .# ./result/bin/wdict --help
nix run .# -- --help
nix run github:pyqlsa/wdict -- --help
cargo install wdict
nix develop .# cargo build ./target/debug/wdict --help
cargo build --release ./target/release/wdict --help ```
```bash Create dictionaries by scraping webpages.
Usage: wdict [OPTIONS]
Options:
-u, --url
[default: https://www.quicksilver899.com/Tolkien/Tolkien_Dictionary.html]
-d, --depth
[default: 1]
-m, --min-word-length
[default: 3]
-f, --file
[default: wdict.txt]
--filter <FILTER>
Filter strategy for words
[default: none]
Possible values:
- deunicode: Transform unicode according to https://github.com/kornelski/deunicode
- none: Leave the string as-is
--site <SITE>
Site policy for discovered links
[default: same]
Possible values:
- same: Allow crawling links, only if the domain exactly matches
- subdomain: Allow crawling links if they are the same domain or subdomains
- all: Allow crawling all links, regardless of domain
-h, --help Print help (see a summary with '-h')
-V, --version Print version
```
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.