Multithreaded Web spider crawler written in Rust.
~~~bash $ apt install openssl libssl-dev ~~~
Add this dependency to your Cargo.toml file.
~~~toml [dependencies] spider = "1.0.2" ~~~
and then you'll be able to use library. Here a simple example
~~~rust extern crate spider;
use spider::website::Website;
fn main() { let mut website: Website = Website::new("https://choosealicense.com"); website.crawl();
for page in website.get_pages() {
println!("- {}", page.get_url());
}
} ~~~
You can use Configuration
object to configure your crawler:
~~~rust // .. let mut website: Website = Website::new("https://choosealicense.com"); website.configuration.blacklisturl.push("https://choosealicense.com/licenses/".tostring()); website.configuration.respectrobotstxt = true; website.configuration.verbose = true; website.crawl(); // .. ~~~
I am open-minded to any contribution. Just fork & commit
on another branch.