Represents an XML 1.0 document as a read-only tree.
rust
// Find element by id.
let doc = roxmltree::Document::parse("<rect id='rect1'/>").unwrap();
let elem = doc.descendants().find(|n| n.attribute("id") == Some("rect1")).unwrap();
assert!(elem.has_tag_name("rect"));
Because in some cases all you need is to retrieve some data from an XML document. And for such cases, we can make a lot of optimizations.
As for roxmltree, it's fast not only because it's read-only, but also because it uses [xmlparser], which is many times faster than [xml-rs]. See the Performance section for details.
Sadly, XML can be parsed in many different ways. roxmltree tries to mimic the behavior of Python's lxml. But unlike lxml, roxmltree does support comments outside the root element.
For more details see docs/parsing.md.
| Feature/Crate | roxmltree | [libxml2] | [xmltree] | [sxd-document] | [minidom] | | ------------------------------- | :--------------: | :-----------------: | :--------------: | :--------------: | :--------------: | | Element namespace resolving | ✓ | ✓ | ✓ | ~1 | ✓ | | Attribute namespace resolving | ✓ | ✓ | | ✓ | ✓ | | [Entity references] | ✓ | ✓ | × | × | × | | [Character references] | ✓ | ✓ | ✓ | ✓ | ✓ | | [Attribute-Value normalization] | ✓ | ✓ | | | | | Comments | ✓ | ✓ | | ✓ | | | Processing instructions | ✓ | ✓ | ✓ | ✓ | | | UTF-8 BOM | ✓ | ✓ | × | × | ✓ | | Non UTF-8 input | | ✓ | | | | | Complete DTD support | | ✓ | | | | | Position preserving2 | ✓ | ✓ | | | | | HTML support | | ✓ | | | | | Tree modification | | ✓ | ✓ | ✓ | ✓ | | Writing | | ✓ | ✓ | ✓ | ✓ | | No unsafe | ✓ | | ✓ | | ~3 | | Language | Rust | C | Rust | Rust | Rust | | Size overhead4 | ~64KiB | ~1.4MiB5 | ~118KiB | ~138KiB | ~62KiB | | Dependencies | 1 | ?5 | 2 | 2 | 2 | | Tested version | 0.11.0 | 2.9.8 | 0.10.0 | 0.3.2 | 0.12.0 | | License | MIT / Apache-2.0 | MIT | MIT | MIT | MIT |
Legend:
Notes:
memchr
crate.There is also elementtree
and treexml
crates, but they are abandoned for a long time.
```text test largeroxmltree ... bench: 3,123,941 ns/iter (+/- 19,992) test largeminidom ... bench: 4,969,218 ns/iter (+/- 163,727) test largesdxdocument ... bench: 7,266,856 ns/iter (+/- 26,998) test large_xmltree ... bench: 21,354,608 ns/iter (+/- 136,311)
test mediumroxmltree ... bench: 547,522 ns/iter (+/- 5,956) test mediumminidom ... bench: 1,223,620 ns/iter (+/- 16,180) test mediumsdxdocument ... bench: 2,470,063 ns/iter (+/- 24,159) test medium_xmltree ... bench: 8,083,860 ns/iter (+/- 25,363)
test tinyroxmltree ... bench: 4,170 ns/iter (+/- 41) test tinyminidom ... bench: 7,495 ns/iter (+/- 81) test tinysdxdocument ... bench: 17,411 ns/iter (+/- 203) test tiny_xmltree ... bench: 29,522 ns/iter (+/- 223) ```
roxmltree uses [xmlparser] internally, while sdx-document uses its own implementation, xmltree uses the [xml-rs] and minidom uses [quick-xml]. Here is a comparison between xmlparser, xml-rs and quick-xml:
```text test largequickxml ... bench: 1,286,273 ns/iter (+/- 27,174) test largexmlparser ... bench: 1,742,202 ns/iter (+/- 11,616) test largexmlrs ... bench: 19,615,797 ns/iter (+/- 105,848)
test mediumquickxml ... bench: 248,169 ns/iter (+/- 3,885) test mediumxmlparser ... bench: 386,658 ns/iter (+/- 1,721) test mediumxmlrs ... bench: 7,387,753 ns/iter (+/- 18,668)
test tinyquickxml ... bench: 2,382 ns/iter (+/- 29) test tinyxmlparser ... bench: 2,788 ns/iter (+/- 20) test tinyxmlrs ... bench: 27,619 ns/iter (+/- 262) ```
```text test xmltreeiterdescendantsexpensive ... bench: 436,684 ns/iter (+/- 7,851) test roxmltreeiterdescendantsexpensive ... bench: 470,459 ns/iter (+/- 6,233) test minidomiterdescendants_expensive ... bench: 785,847 ns/iter (+/- 51,495)
test roxmltreeiterdescendantsinexpensive ... bench: 36,759 ns/iter (+/- 684) test xmltreeiterdescendantsinexpensive ... bench: 168,541 ns/iter (+/- 1,885) test minidomiterdescendants_inexpensive ... bench: 215,615 ns/iter (+/- 38,101) ```
Where expensive refers to the matching done on each element. In these benchmarks, expensive means searching for any node in the document which contains a string. And inexpensive means searching for any element with a particular name.
You can try running the benchmarks yourself by running cargo bench
in the benches
dir.
xmlReadFile()
will parse only an XML structure,
without attributes normalization and stuff. So it's hard to compare.
And we have to use a separate benchmark utility.unsafe
code.This library uses Rust's idiomatic API based on iterators. In case you are more familiar with browser/JS DOM APIs - you can check out tests/dom-api.rs to see how it can be converted into a Rust one.
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.