parsercher

Crate API

Parses and searches Tag documents. (e.g. HTML, XML)

parsercher parses documents written in tags such as HTML and XML. - Create a Dom structure tree from the tag document. - Search for tags and text from the Dom structure tree. - Search subtrees from the Dom structure tree.

Usage

Add this to your Cargo.toml: [dependencies] parsercher = "3.1.2"

License

MIT OR Apache-2.0

Examples

Example of getting text from HTML.
Create a tree of Dom structure from HTML and get the text of li tag that value of class attribute is target. ```rust use parsercher; use parsercher::dom::Tag;

let html = r#" sample html

  1. first
  2. second
  3. therd
"#;

if let Ok(rootdom) = parsercher::parse(&html) { let mut needle = Tag::new("li"); needle.setattr("class", "target");

if let Some(texts) = parsercher::search_text_from_tag_children(&root_dom, &needle) {
    assert_eq!(texts.len(), 2);
    assert_eq!(texts[0], "first".to_string());
    assert_eq!(texts[1], "therd".to_string());
}

} ```

Example of searching a subtree from the Dom structure tree.

Find a subtree that has a ul tag whose value in the class attribute is targetList and two li tags under it. Also, the values of the class attribute of the li tag must be key1 andkey2, respectively.

Looking for: ```text

```

```rust use parsercher;

let doc = r#"

<ul id="list2">
  <li class="key1">2-1</li>
  <li>2-2</li>
</ul>

<div>
  <div>
    <ul class="targetList">
      <ul id="list3" class="targetList">
        <li class="key1">3-1</li>
        <li class="item">3-2</li>
        <li class="key2">3-3</li>
      </ul>
    </ul>
  </div>
</div>

<ul id="list4">
  <li class="key1">4-1</li>
  <li class="key2">4-2</li>
</ul>

"#;

let root_dom = parsercher::parse(&doc).unwrap();

let needle = r#"

"#; let needledom = parsercher::parse(&needle).unwrap(); // Remove rootdom of needledom let needledom = needledom.get_children().unwrap().get(0).unwrap();

if let Some(dom) = parsercher::searchdom(&rootdom, &needledom) { parsercher::printdom_tree(&dom); } output: text