Also check out other xwde
projects here.
The implementation of the robots.txt (or URL exclusion) protocol in the Rust
programming language with the support of crawl-delay
, sitemap
and universal
*
match extensions (according to the RFC specification).
user-agent
in the provided robots.txt
file:```rust use robotxt::Robots;
fn main() { let txt = r#" User-Agent: foobot Disallow: * Allow: /example/ Disallow: /example/nope.txt "#.as_bytes();
let r = Robots::from_bytes(txt, "foobot");
assert!(r.is_allowed("/example/yeah.txt"));
assert!(!r.is_allowed("/example/nope.txt"));
assert!(!r.is_allowed("/invalid/path.txt"));
} ```
robots.txt
file from provided directives:```rust use url::Url; use robotxt::Factory;
fn main() { let txt = Factory::default() .header("Robots.txt Header") .group(["foobot"], |u| { u.crawldelay(5) .header("Rules for Foobot: Start") .allow("/example/yeah.txt") .disallow("/example/nope.txt") .footer("Rules for Foobot: End") }) .group(["barbot", "nombot"], |u| { u.crawldelay(2) .disallow("/example/yeah.txt") .disallow("/example/nope.txt") }) .sitemap("https://example.com/sitemap1.xml").unwrap() .sitemap("https://example.com/sitemap2.xml").unwrap() .footer("Robots.txt Footer");
println!("{}", txt.to_string());
} ```
Host
directive is not supported