roxmltree

Build Status Crates.io Documentation Rust 1.18+

Represents an XML 1.0 document as a read-only tree.

rust // Find element by id. let doc = roxmltree::Document::parse("<rect id='rect1'/>").unwrap(); let elem = doc.descendants().find(|n| n.attribute("id") == Some("rect1")).unwrap(); assert!(elem.has_tag_name("rect"));

Why read-only?

Because in some cases all you need is to retrieve some data from an XML document. And for such cases, we can make a lot of optimizations.

As for roxmltree, it's fast not only because it's read-only, but also because it uses [xmlparser], which is many times faster than [xml-rs]. See the Performance section for details.

Parsing behavior

Sadly, XML can be parsed in many different ways. roxmltree tries to mimic the behavior of Python's lxml. But unlike lxml, roxmltree does support comments outside the root element.

Fo more details see docs/parsing.md.

Alternatives

| Feature/Crate | roxmltree | [libxml2] | [xmltree] | [elementtree] | [sxd-document] | [treexml] | | ------------------------------- | :--------------: | :-----------------: | :--------------: | :--------------: | :--------------: | :--------------: | | Element namespace resolving | ✔ | ✔ | ✔ | ✔ | ~1 | | | Attribute namespace resolving | ✔ | ✔ | | | ✔ | | | [Entity references] | ✔ | ✔ | ⚠ | ⚠ | ⚠ | ⚠ | | [Character references] | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | | [Attribute-Value normalization] | ✔ | ✔ | | | | | | Comments | ✔ | ✔ | | | ✔ | | | Processing instructions | ✔ | ✔ | ⚠ | | ✔ | | | UTF-8 BOM | ✔ | ✔ | ⚠ | ⚠ | ⚠ | ⚠ | | Non UTF-8 input | | ✔ | | | | | | Complete DTD support | | ✔ | | | | | | Position preserving2 | ✔ | ✔ | | | | | | HTML support | | ✔ | | | | | | Tree modification | | ✔ | ✔ | ✔ | ✔ | ✔ | | Writing | | ✔ | ✔ | ✔ | ✔ | ✔ | | No unsafe | ✔ | | ✔ | ~3 | | ✔ | | Language | Rust | C | Rust | Rust | Rust | Rust | | Size overhead4 | ~73KiB | ~1.4MiB5 | ~80KiB | ~96KiB | ~135KiB | ~110KiB | | Dependencies | 1 | ?5 | 2 | 18 | 2 | 14 | | Tested version | 0.8.0 | 2.9.8 | 0.9.0 | 0.5.0 | 0.3.0 | 0.7.0 | | License | MIT / Apache-2.0 | MIT | MIT | BSD-3-Clause | MIT | MIT |

Legend:

Notes:

  1. No default namespace propagation.
  2. roxmltree keeps all node and attribute positions in the original document, so you can easily retrieve it if you need it. See examples/print_pos.rs for details.
  3. In the string_cache crate.
  4. Binary size overhead according to cargo-bloat.
  5. Depends on build flags.

Performance

```text test largeroxmltree ... bench: 3,976,162 ns/iter (+/- 16,229) test largesdxdocument ... bench: 7,501,511 ns/iter (+/- 33,603) test largexmltree ... bench: 20,821,266 ns/iter (+/- 80,124) test largeelementtree ... bench: 21,388,702 ns/iter (+/- 115,590) test largetreexml ... bench: 21,469,671 ns/iter (+/- 192,099)

test mediumroxmltree ... bench: 732,136 ns/iter (+/- 6,410) test mediumsdxdocument ... bench: 2,548,236 ns/iter (+/- 14,502) test mediumelementtree ... bench: 8,505,173 ns/iter (+/- 26,123) test mediumtreexml ... bench: 8,146,522 ns/iter (+/- 19,378) test mediumxmltree ... bench: 8,217,647 ns/iter (+/- 22,061)

test tinyroxmltree ... bench: 5,039 ns/iter (+/- 46) test tinysdxdocument ... bench: 18,204 ns/iter (+/- 145) test tinyelementtree ... bench: 30,865 ns/iter (+/- 280) test tinytreexml ... bench: 30,698 ns/iter (+/- 468) test tinyxmltree ... bench: 30,338 ns/iter (+/- 231) ```

roxmltree uses [xmlparser] internally, while sdx-document uses its own implementation and xmltree, elementtree and treexml use the [xml-rs] crate. Here is a comparison between xmlparser, xml-rs and quick-xml:

```text test largequickxml ... bench: 1,220,067 ns/iter (+/- 20,723) test largexmlparser ... bench: 2,079,871 ns/iter (+/- 12,220) test largexmlrs ... bench: 19,628,313 ns/iter (+/- 241,729)

test mediumquickxml ... bench: 246,421 ns/iter (+/- 17,438) test mediumxmlparser ... bench: 408,831 ns/iter (+/- 4,351) test mediumxmlrs ... bench: 7,430,009 ns/iter (+/- 40,350)

test tinyquickxml ... bench: 2,329 ns/iter (+/- 67) test tinyxmlparser ... bench: 3,313 ns/iter (+/- 22) test tinyxmlrs ... bench: 28,511 ns/iter (+/- 232) ```

You can try it yourself by running cargo bench in the benches dir.

Notes:

Safety

Non-goals

API

This library uses Rust's idiomatic API based on iterators. In case you are more familiar with browser/JS DOM APIs - you can check out tests/dom-api.rs to see how it can be converted into a Rust one.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.