robotxt

Build Status Crate Docs Crate Version Crate Coverage

Also check out other xwde projects here.

The implementation of the robots.txt (or URL exclusion) protocol in the Rust programming language with the support of crawl-delay, sitemap and universal * match extensions (according to the RFC specification).

Examples

```rust use robotxt::Robots;

fn main() { let txt = r#" User-Agent: foobot Disallow: * Allow: /example/ Disallow: /example/nope.txt "#.as_bytes();

let r = Robots::from_bytes(txt, "foobot");
assert!(r.is_allowed("/example/yeah.txt"));
assert!(!r.is_allowed("/example/nope.txt"));
assert!(!r.is_allowed("/invalid/path.txt"));

} ```

```rust use url::Url; use robotxt::Factory;

fn main() { let txt = Factory::default() .header("Robots.txt Header") .group(["foobot"], |u| { u.crawldelay(5) .header("Rules for Foobot: Start") .allow("/example/yeah.txt") .disallow("/example/nope.txt") .footer("Rules for Foobot: End") }) .group(["barbot", "nombot"], |u| { u.crawldelay(2) .disallow("/example/yeah.txt") .disallow("/example/nope.txt") }) .sitemap("https://example.com/sitemap1.xml").unwrap() .sitemap("https://example.com/sitemap2.xml").unwrap() .footer("Robots.txt Footer");

println!("{}", txt.to_string());

} ```

Links

Notes