A lightweight and efficient web crawler in Rust, optimized for concurrent scraping while respecting robots.txt
rules.
robots.txt
: Automatically fetches and adheres to website scraping guidelines.Add crawly
to your Cargo.toml
:
toml
[dependencies]
crawly = "0.1.0"
A simple usage example:
```rust use anyhow::Result; use crawly::Crawler;
async fn main() -> Result<()> { let crawler = Crawler::new()?; let results = crawler.crawl_url("https://example.com").await?;
for (url, content) in &results {
println!("URL: {}\nContent: {}", url, content);
}
Ok(())
} ```
For more refined control over the crawler's behavior, the CrawlerBuilder comes in handy:
```rust use anyhow::Result; use crawly::CrawlerBuilder;
async fn main() -> Result<()> { let crawler = CrawlerBuilder::new() .withmaxdepth(10) .withmaxpages(100) .withmaxconcurrentrequests(50) .withratelimitwait_seconds(2) .build()?;
let results = crawler.crawl_url("https://www.example.com").await?;
for (url, content) in &results {
println!("URL: {}\nContent: {}", url, content);
}
Ok(())
} ```
Contributions, issues, and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.
This project is MIT licensed.