sgmlish

This is a library for handling SGML. It's not intended to be a full-featured implementation of the SGML spec; rather, it's meant to successfully parse common SGML uses, and then apply a number of normalization passes to make it suitable for deserialization, like inserting implied end tags.

In particular, DTDs are not supported. That means any desired validation or normalization must be performed either manually or through the built-in transforms.

Goals

Non-goals

Usage

This is a quick guide on deriving deserialization of data structures with [Serde].

First, add sgmlish and serde to your dependencies:

```toml

Cargo.toml

[dependencies] serde = { version = "1.0", features = ["derive"] } sgmlish = "0.1" ```

Defining your data structures is similar to using any other Serde library:

```rust use serde::Deserialize;

[derive(Deserialize)]

struct Example { name: String, version: Option, } ```

Usage deviates a bit from other deserializers. The process is usually split in three phases:

rust let input = r##" <CRATE> <NAME>sgmlish</NAME> <VERSION>0.1</VERSION> </CRATE> "##; let sgml = // Phase 1: tokenization sgmlish::parse(input)? // Phase 2: normalization .trim_spaces() .lowercase_identifiers(); // Phase 3: deserialization let example = sgmlish::from_fragment::<Crate>(sgml)?;

  1. Tokenization: sgmlish::parse() is invoked on an input string, producing a fragment, which is a series of events.

  2. Normalization: because SGML is so flexible, you'll almost certainly want to apply a few normalization passes to the data before deserializing.

    Some passes of interest:

    A very important rule: before proceding with deserialization, all start tags must have a matching end tag with identical case, in a consistent hierarchy.

  3. Deserialization: once the event stream is normalized, pass on to Serde and let it do its magic.

Interpretation when deserializing

Crate features