Spider

crate version

Multithreaded Web spider crawler written in Rust.

Dependencies

~~~bash $ apt install openssl libssl-dev ~~~

Usage

Add this dependency to your Cargo.toml file.

~~~toml [dependencies] spider = "1.0.2" ~~~

and then you'll be able to use library. Here a simple example

~~~rust extern crate spider;

use spider::website::Website;

fn main() { let mut website: Website = Website::new("https://choosealicense.com"); website.crawl();

for page in website.get_pages() {
    println!("- {}", page.get_url());
}

} ~~~

You can use Configuration object to configure your crawler:

~~~rust // .. let mut website: Website = Website::new("https://choosealicense.com"); website.configuration.blacklisturl.push("https://choosealicense.com/licenses/".tostring()); website.configuration.respectrobotstxt = true; website.configuration.verbose = true; website.crawl(); // .. ~~~

TODO

Contribute

I am open-minded to any contribution. Just fork & commit on another branch.