Represent an XML 1.0 document as a read-only tree.
Because in some cases all you need is to retrieve some data from the XML document. And for such cases, we can make a lot of optimizations.
As for roxmltree, it's fast not only because it's read-only, but also because it uses [xmlparser], which is times faster then [xml-rs]. See Performance section for details.
Sadly, XML can be parsed in many different ways. The roxmltree is trying to mimic the Python's lxml behavior.
Unlike the lxml, roxmltree do support comments outside the root element.
Fo more details see docs/parsing.md.
* Rust besed for now
| Feature/Crate | roxmltree | [xmltree] | [elementtree] | [sxd-document] | [treexml] |
| ------------------------------- | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: |
| Element namespace resolving | ✔ | ✔ | ✔ | ~1 | |
| Attribute namespace resolving | ✔ | | | ✔ | |
| [Entity references] | ✔2 | ⚠ | ⚠ | ⚠ | ⚠ |
| [Character references] | ✔ | ✔ | ✔ | ✔ | ✔ |
| [Attribute-Value normalization] | ✔ | | | | |
| Comments | ✔ | | | ✔ | |
| Processing instructions | ✔ | ⚠ | | ✔ | |
| UTF-8 BOM | ✔ | ⚠ | ⚠ | ⚠ | ⚠ |
| Non UTF-8 input | | | | | |
| Complete DTD support | | | | | |
| Position preserving3 | ✔ | | | | |
| xml:space
| | | | | |
| Tree modifications | | ✔ | ✔ | ✔ | ✔ |
| Writing | | ✔ | ✔ | ✔ | ✔ |
| No unsafe | ✔ | ✔ | ~4 | | ✔ |
| Size overhead5 | ~60KiB | ~80KiB | ~96KiB | ~130KiB | ~110KiB |
| Dependencies | 1 | 2 | 18 | 2 | 14 |
| Tested version | 0.1.0 | 0.8.0 | 0.5.0 | 0.2.6 | 0.7.0 |
| License | MIT / Apache-2.0 | MIT | BSD-3-Clause | MIT | MIT |
Legend:
Notes:
string_cache
crate.```text test largeroxmltree ... bench: 8,807,741 ns/iter (+/- 70,532) test largesdxdocument ... bench: 9,777,811 ns/iter (+/- 242,912) test largexmltree ... bench: 31,041,407 ns/iter (+/- 27,171) test largetreexml ... bench: 32,048,129 ns/iter (+/- 29,860) test largeelementtree ... bench: 32,073,296 ns/iter (+/- 68,433)
test mediumroxmltree ... bench: 1,735,369 ns/iter (+/- 3,218) test mediumsdxdocument ... bench: 3,569,814 ns/iter (+/- 10,518) test mediumtreexml ... bench: 11,163,737 ns/iter (+/- 26,084) test mediumxmltree ... bench: 11,267,754 ns/iter (+/- 70,971) test mediumelementtree ... bench: 11,629,513 ns/iter (+/- 27,055) ```
roxmltree uses [xmlparser] internally, while sdx-document uses it's own one and xmltree, elementtree and treexml are using the [xml-rs] crate. Here is a comparison between xmlparser and xml-rs:
```text test largexmlparser ... bench: 2,019,245 ns/iter (+/- 693) test largexmlrs ... bench: 29,086,480 ns/iter (+/- 22,741)
test mediumxmlparser ... bench: 434,140 ns/iter (+/- 231) test mediumxmlrs ... bench: 10,391,411 ns/iter (+/- 24,738) ```
Note: tree crates may use different xml-rs crate versions.
You can try it yourself using cargo bench --features benchmark
The library uses Rust's idiomatic API based on iterators. In case you are more familiar with the browsers/JS DOM API - you can check out the tests/dom-api.rs to see how it can be converted into a Rust one.
Rust >= 1.18
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.