sgmlish

![Build status] ![Version badge] ![Docs badge]

sgmlish is a library for parsing, manipulating and deserializing SGML.

It's not intended to be a full-featured implementation of the SGML spec; in particular, DTDs are not supported. That means case normalization and entities must be configured before parsing, and any desired validation or normalization, like inserting omitted tags, must be either performed through a built-in transform or implemented manually.

Still, its support is complete enough to successfully parse SGML documents for common applications, like [OFX] 1.x, and with little extra work it's ready to delegate to [Serde].

Non-goals

Usage

This is a quick guide on deriving deserialization of data structures with [Serde].

First, add sgmlish and serde to your dependencies:

```toml

Cargo.toml

[dependencies] serde = { version = "1.0", features = ["derive"] } sgmlish = "0.2" ```

Defining your data structures is similar to using any other Serde library:

```rust use serde::Deserialize;

[derive(Deserialize)]

struct Example { name: String, version: Option, } ```

Usage is typically performed in three steps:

rust let input = r##" <CRATE> <NAME>sgmlish <VERSION>0.2 </CRATE> "##; // Step 1: configure parser, then parse string let sgml = sgmlish::Parser::build() .lowercase_names() .parse(input)?; // Step 2: normalization/validation let sgml = sgmlish::transforms::normalize_end_tags(sgml)?; // Step 3: deserialize into the desired type let example = sgmlish::from_fragment::<Example>(sgml)?;

  1. Parsing: configure a [sgmlish::Parser] as desired — for example, by normalizing tag names or defining how entities (&example;) should be resolved. Once it's configured, feed it the SGML string.

  2. Normalization/validation: as the parser is not aware of DTDs, it does not know how to insert implied end tags, if those are accepted in your use case, or how to handle other more esoteric SGML features, like empty tags. This must be fixed before proceding with deserialization.

    A normalization transform is offered with this library: [normalize_end_tags]. It assumes end tags are only omitted when the element cannot contain child elements. This algorithm is good enough for many SGML applications, like [OFX].

  3. Deserialization: once the event stream is normalized, pass on to Serde and let it do its magic.

Interpretation when deserializing

Crate features