Represents an XML 1.0 document as a read-only tree.
rust
// Find element by id.
let doc = roxmltree::Document::parse("<rect id='rect1'/>")?;
let elem = doc.descendants().find(|n| n.attribute("id") == Some("rect1"))?;
assert!(elem.has_tag_name("rect"));
Because in some cases all you need is to retrieve some data from an XML document. And for such cases, we can make a lot of optimizations.
As for roxmltree, it's fast not only because it's read-only, but also because it uses [xmlparser], which is many times faster than [xml-rs]. See the Performance section for details.
Sadly, XML can be parsed in many different ways. roxmltree tries to mimic the behavior of Python's lxml. But unlike lxml, roxmltree does support comments outside the root element.
For more details see docs/parsing.md.
| Feature/Crate | roxmltree | [libxml2] | [xmltree] | [sxd-document] | | ------------------------------- | :--------------: | :-----------------: | :--------------: | :--------------: | | Element namespace resolving | ✓ | ✓ | ✓ | ~1 | | Attribute namespace resolving | ✓ | ✓ | | ✓ | | [Entity references] | ✓ | ✓ | × | × | | [Character references] | ✓ | ✓ | ✓ | ✓ | | [Attribute-Value normalization] | ✓ | ✓ | | | | Comments | ✓ | ✓ | | ✓ | | Processing instructions | ✓ | ✓ | ✓ | ✓ | | UTF-8 BOM | ✓ | ✓ | × | × | | Non UTF-8 input | | ✓ | | | | Complete DTD support | | ✓ | | | | Position preserving2 | ✓ | ✓ | | | | HTML support | | ✓ | | | | Tree modification | | ✓ | ✓ | ✓ | | Writing | | ✓ | ✓ | ✓ | | No unsafe | ✓ | | ✓ | | | Language | Rust | C | Rust | Rust | | Size overhead4 | ~55KiB | ~1.4MiB5 | ~78KiB | ~102KiB | | Dependencies | 1 | ?5 | 2 | 2 | | Tested version | 0.18.0 | 2.9.8 | 0.10.2 | 0.3.2 | | License | MIT / Apache-2.0 | MIT | MIT | MIT |
Legend:
Notes:
memchr
crate.There is also elementtree
and treexml
crates, but they are abandoned for a long time.
```text test hugeroxmltree ... bench: 3,152,020 ns/iter (+/- 38,556) test hugelibxml ... bench: 6,779,906 ns/iter (+/- 184,744) test hugesdxdocument ... bench: 8,289,337 ns/iter (+/- 378,131) test huge_xmltree ... bench: 45,309,549 ns/iter (+/- 1,591,562)
test largeroxmltree ... bench: 1,568,688 ns/iter (+/- 9,956) test largelibxml ... bench: 3,199,587 ns/iter (+/- 139,486) test largesdxdocument ... bench: 3,731,708 ns/iter (+/- 92,787) test large_xmltree ... bench: 15,605,566 ns/iter (+/- 331,504)
test mediumroxmltree ... bench: 430,778 ns/iter (+/- 18,070) test mediumlibxml ... bench: 932,408 ns/iter (+/- 8,763) test mediumsdxdocument ... bench: 1,452,152 ns/iter (+/- 54,983) test medium_xmltree ... bench: 4,903,558 ns/iter (+/- 116,875)
test tinyroxmltree ... bench: 2,630 ns/iter (+/- 41) test tinylibxml ... bench: 9,113 ns/iter (+/- 183) test tinysdxdocument ... bench: 10,388 ns/iter (+/- 116) test tiny_xmltree ... bench: 22,067 ns/iter (+/- 228) ```
roxmltree uses [xmlparser] internally, while sdx-document uses its own implementation, xmltree uses the [xml-rs]. Here is a comparison between xmlparser, xml-rs and quick-xml:
```text test hugexmlparser ... bench: 1,744,585 ns/iter (+/- 28,509) test hugequickxml ... bench: 2,818,954 ns/iter (+/- 66,923) test hugexmlrs ... bench: 41,072,412 ns/iter (+/- 519,803)
test largexmlparser ... bench: 756,125 ns/iter (+/- 13,995) test largequickxml ... bench: 1,401,189 ns/iter (+/- 28,295) test largexmlrs ... bench: 12,920,333 ns/iter (+/- 143,508)
test mediumquickxml ... bench: 216,080 ns/iter (+/- 5,479) test mediumxmlparser ... bench: 258,587 ns/iter (+/- 3,684) test mediumxmlrs ... bench: 4,629,016 ns/iter (+/- 109,023)
test tinyxmlparser ... bench: 1,087 ns/iter (+/- 16) test tinyquickxml ... bench: 2,420 ns/iter (+/- 51) test tinyxmlrs ... bench: 18,974 ns/iter (+/- 162) ```
```text
test roxmltreeiterdescendantsexpensive ... bench: 255,261 ns/iter (+/- 1,424) test xmltreeiterdescendantsexpensive ... bench: 354,316 ns/iter (+/- 3,383)
test roxmltreeiterdescendantsinexpensive ... bench: 20,736 ns/iter (+/- 218) test xmltreeiterdescendantsinexpensive ... bench: 125,849 ns/iter (+/- 1,200)
test roxmltreeiterchildren ... bench: 1,409 ns/iter (+/- 54) ```
Where expensive refers to the matching done on each element. In these benchmarks, expensive means searching for any node in the document which contains a string. And inexpensive means searching for any element with a particular name.
The benchmarks were taken on a Apple M1 Pro.
You can try running the benchmarks yourself by running cargo bench
in the benches
dir.
roxmltree
tries to use as little memory as possible to allow parsing
very large (multi-GB) XML files.
The peak memory usage doesn't directly correlates with the file size but rather with the amount of nodes and attributes a file has. How many attributes had to be normalized (i.e. allocated). And how many text nodes had to be preprocessed (i.e. allocated).
roxmltree
never allocates element and attribute names, processing instructions
and comments.
By disabling the positions
feature, you can shave by 8 bytes from each node and attribute.
On average, the overhead is around 6-8x the file size.
For example, our 1.1GB sample XML will peak at 7.6GB RAM with default features enabled
and at 6.8GB RAM when positions
is disabled.
unsafe
code.This library uses Rust's idiomatic API based on iterators. In case you are more familiar with browser/JS DOM APIs - you can check out tests/dom-api.rs to see how it can be converted into a Rust one.
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.