robotxt

Build Status Crate Docs Crate Version Crate Coverage

Also check out other xwde projects here.

The implementation of the robots.txt (or URL exclusion) protocol in the Rust programming language with the support of crawl-delay, sitemap and universal * match extensions (according to the RFC specification).

Features

Examples

```rust use robotxt::Robots;

fn main() { let txt = r#" User-Agent: foobot Disallow: * Allow: /example/ Disallow: /example/nope.txt "#.as_bytes();

let r = Robots::from_bytes(txt, "foobot");
assert!(r.is_allowed("/example/yeah.txt"));
assert!(!r.is_allowed("/example/nope.txt"));
assert!(!r.is_allowed("/invalid/path.txt"));

} ```

```rust use robotxt::RobotsBuilder;

fn main() { let txt = RobotsBuilder::default() .header("Robots.txt: Start") .group(["foobot"], |u| { u.crawldelay(5) .header("Rules for Foobot: Start") .allow("/example/yeah.txt") .disallow("/example/nope.txt") .footer("Rules for Foobot: End") }) .group(["barbot", "nombot"], |u| { u.crawldelay(2) .disallow("/example/yeah.txt") .disallow("/example/nope.txt") }) .sitemap("https://example.com/sitemap1.xml".tryinto().unwrap()) .sitemap("https://example.com/sitemap1.xml".tryinto().unwrap()) .footer("Robots.txt: End");

println!("{}", txt.to_string());

} ```

Links

Notes