Skyscraper - HTML scraping with XPath

Rust library to scrape HTML documents with XPath expressions.

HTML Parsing

Skyscraper has its own HTML parser implementation. The parser outputs a tree structure that can be traversed manually with parent/child relationships.

Example: Simple HTML Parsing

```rust use skyscraper::html::{self, parse::ParseError}; let html_text = r##"

Hello world

"##;

let document = html::parse(html_text)?; ```

Example: Traversing Parent/Child Relationships

```rust // Parse the HTML text into a document let text = r#""#; let document = html::parse(text)?;

// Get the children of the root node let parentnode: DocumentNode = document.rootnode; let children: Vec = parentnode.children(&document).collect(); asserteq!(2, children.len());

// Get the parent of both child nodes let parentofchild0: DocumentNode = children[0].parent(&document).expect("parent of child 0 missing"); let parentofchild1: DocumentNode = children[1].parent(&document).expect("parent of child 1 missing");

asserteq!(parentnode, parentofchild0); asserteq!(parentnode, parentofchild1); ```

XPath Expressions

Skyscraper is capable of parsing XPath strings and applying them to HTML documents.

```rust use skyscraper::{html, xpath}; // Parse the html text into a document. let html_text = r##"

yes

"##; let document = html::parse(html_text)?;

// Parse and apply the xpath. let expr = xpath::parse("//div[@class='foo']/span")?; let results = expr.apply(&document)?; assert_eq!(1, results.len());

// Get text from the node let text = results[0].gettext(&document).expect("text missing"); asserteq!("yes", text);

// Get attributes from the node let attributes = results[0].getattributes(&document).expect("no attributes"); asserteq!("value", attributes["some_attr"]); ```