High performance xml pull reader/writer.
The reader:
- is almost zero-copy (use of Cow
whenever possible)
- is easy on memory allocation (the API provides a way to reuse buffers)
- support various encoding (with encoding
feature), namespaces resolution, special characters.
Syntax is inspired by xml-rs.
```rust use quickxml::events::Event; use quickxml::reader::Reader;
let xml = r#"
let mut count = 0; let mut txt = Vec::new(); let mut buf = Vec::new();
// The Reader
does not implement Iterator
because it outputs borrowed data (Cow
s)
loop {
// NOTE: this is the generic case when we don't know about the input BufRead.
// when the input is a &str or a &[u8], we don't actually need to use another
// buffer, we could directly call reader.read_event()
match reader.readeventinto(&mut buf) {
Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
// exits the loop when reaching end of file
Ok(Event::Eof) => break,
Ok(Event::Start(e)) => {
match e.name().as_ref() {
b"tag1" => println!("attributes values: {:?}",
e.attributes().map(|a| a.unwrap().value)
.collect::<Vec<_>>()),
b"tag2" => count += 1,
_ => (),
}
}
Ok(Event::Text(e)) => txt.push(e.unescape().unwrap().into_owned()),
// There are several other `Event`s we do not consider here
_ => (),
}
// if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
buf.clear();
} ```
```rust use quickxml::events::{Event, BytesEnd, BytesStart}; use quickxml::reader::Reader; use quick_xml::writer::Writer; use std::io::Cursor;
let xml = r#"
// crates a new element ... alternatively we could reuse `e` by calling
// `e.into_owned()`
let mut elem = BytesStart::new("my_elem");
// collect existing attributes
elem.extend_attributes(e.attributes().map(|attr| attr.unwrap()));
// copy existing attributes, adds a new my-key="some value" attribute
elem.push_attribute(("my-key", "some value"));
// writes the event to the writer
assert!(writer.write_event(Event::Start(elem)).is_ok());
},
Ok(Event::End(e)) if e.name().as_ref() == b"this_tag" => {
assert!(writer.write_event(Event::End(BytesEnd::new("my_elem"))).is_ok());
},
Ok(Event::Eof) => break,
// we can either move or borrow the event to write, depending on your use-case
Ok(e) => assert!(writer.write_event(e).is_ok()),
Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
}
}
let result = writer.intoinner().intoinner();
let expected = r#"
When using the serialize
feature, quick-xml can be used with serde's Serialize
/Deserialize
traits.
This has largely been inspired by serde-xml-rs.
quick-xml follows its convention for deserialization, including the
$value
special name.
If you have an input of the form <foo abc="xyz">bar</foo>
, and you want to get at the bar
,
you can use either the special name $text
, or the special name $value
:
rust,ignore
struct Foo {
pub abc: String,
#[serde(rename = "$text")]
pub body: String,
}
Read about the difference in the documentation.
Note that despite not focusing on performance (there are several unnecessary copies), it remains about 10x faster than serde-xml-rs.
encoding
: support non utf8 xmlsserialize
: support serde Serialize
/Deserialize
Benchmarking is hard and the results depend on your input file and your machine.
Here on my particular file, quick-xml is around 50 times faster than xml-rs crate.
``` // quick-xml benches test benchquickxml ... bench: 198,866 ns/iter (+/- 9,663) test benchquickxmlescaped ... bench: 282,740 ns/iter (+/- 61,625) test benchquickxmlnamespaced ... bench: 389,977 ns/iter (+/- 32,045)
// same bench with xml-rs test benchxmlrs ... bench: 14,468,930 ns/iter (+/- 321,171)
// serde-xml-rs vs serialize feature test benchserdequickxml ... bench: 1,181,198 ns/iter (+/- 138,290) test benchserdexmlrs ... bench: 15,039,564 ns/iter (+/- 783,485) ```
For a feature and performance comparison, you can also have a look at RazrFalcon's parser comparison table.
Any PR is welcomed!
MIT