Rust library to scrape HTML documents with XPath expressions.
Skyscraper has its own HTML parser implementation. The parser outputs a tree structure that can be traversed manually with parent/child relationships.
```rust use skyscraper::html::{self, parse::ParseError}; let html_text = r##"
let document = html::parse(html_text)?; ```
```rust
// Parse the HTML text into a document
let text = r#"
// Get the children of the root node
let parentnode: DocumentNode = document.rootnode;
let children: Vec
// Get the parent of both child nodes let parentofchild0: DocumentNode = children[0].parent(&document).expect("parent of child 0 missing"); let parentofchild1: DocumentNode = children[1].parent(&document).expect("parent of child 1 missing");
asserteq!(parentnode, parentofchild0); asserteq!(parentnode, parentofchild1); ```
Skyscraper is capable of parsing XPath strings and applying them to HTML documents.
```rust use skyscraper::{html, xpath}; // Parse the html text into a document. let html_text = r##"
"##; let document = html::parse(html_text)?;
// Parse and apply the xpath. let expr = xpath::parse("//div[@class='foo']/span")?; let results = expr.apply(&document)?; assert_eq!(1, results.len());
// Get text from the node let text = results[0].gettext(&document).expect("text missing"); asserteq!("yes", text);
// Get attributes from the node let attributes = results[0].getattributes(&document).expect("no attributes"); asserteq!("value", attributes["some_attr"]); ```