Spider CLI

crate version

A fast command line spider or crawler.

Dependencies

On Linux

Usage

The CLI is a binary so do not add it to your Cargo.toml file.

sh cargo install spider_cli

Cli

The following can also be ran via command line to run the crawler. If you need loging pass in the -v flag.

sh spider -v --domain https://choosealicense.com crawl

Crawl and output all links visited to a file.

sh spider --domain https://choosealicense.com crawl -o > spider_choosealicense.json

```sh spider_cli 1.29.0 madeindjs contact@rousseau-alexandre.fr, j-mendez jeff@a11ywatch.com Multithreaded web crawler written in Rust.

USAGE: spider [OPTIONS] --domain [SUBCOMMAND]

OPTIONS: -b, --blacklist-url Comma seperated string list of pages to not crawl or regex with feature enabled

-c, --concurrency <CONCURRENCY>
        How many request can be run simultaneously

-d, --domain <DOMAIN>
        Domain to crawl

-D, --delay <DELAY>
        Polite crawling delay in milli seconds

-h, --help
        Print help information

-r, --respect-robots-txt
        Respect robots.txt file

-u, --user-agent <USER_AGENT>
        User-Agent

-v, --verbose
        Print page visited on standard output

-V, --version
        Print version information

SUBCOMMANDS: crawl crawl the website extracting links help Print this message or the help of the given subcommand(s) scrape scrape the website extracting html and links ```

All features are available except the Website struct on_link_find_callback configuration option.