litua

author:
tajpulo
version:
2.0.0
badges:
state of the release process state of the build process

Read a text document, receive its tree in Lua and manipulate it before representing it as string.

What is it about?

The input

Text documents occur in many contexts. Actually, we like them as a simple means to document ideas and concepts. They help us communicate. But sometimes, we want to transform them to other text formats or process its content. litua helps with that in a particular way.

You can write a text document like this:

In olden times when wishing still helped one, there lived a king whose daughters were all beautiful; and the youngest was so beautiful that the sun itself, which has seen so much, was astonished whenever it shone in her face.

But this text is boring. You usually care about markup. Markup are special instructions which annotate text:

In olden times when wishing still helped one, there lived a {bold king} whose daughters were all {italic beautiful}; and the youngest was so beautiful that the sun itself, which has seen so much, was astonished whenever it shone in her face.

In this case, the text {bold X} and {italic Y} has some special meaning. For example, it could mean that the text is represented with a special style (e.g. X in a bold font and Y in cursive script). In general, we define litua input syntax in the following manner:

{element[attr1=value1][attribute2=val2] text content of element}

And finally, I will tell you a secret: value1, val2, and text content of element need not be text, but can also be an element itself. Thus, the following is permitted in litua input syntax:

{bold[font-face=Bullshit Sans] {italic Blockchain managed information density}}

In this sense, litua input syntax is very similar to XML (<element attr1="value1" attribute2="val2">text content of element</element>), LISP (e.g. (element :attr1 "value1" :attribute2 "val2" "text content of element")), and markup languages in general. By the way, if you literally need a { or } in your document, you can escape these semantics by writing {left-curly-brace} or {right-curly-brace} respectively instead. litua input syntax files must always be encoded in UTF-8.

Processing the document

Let us put the element-example in litua input syntax into a text document (doc.lit). Then we can invoke litua:

bash$ litua doc.lit

The output is in the file with extension out: doc.out. And it is super-boring: It is exactly the input:

bash$ cat doc.out {element[attr1=value1][attribute2=val2] text content of element}

It becomes interesting, if I tell you that there is a representation of this element in Lua:

lua local node = { -- the string giving the node type ["call"] = "element", -- the key-value pairs of arguments. -- values are sequences of strings or nodes ["args"] = { ["attr1"] = { [1] = "value1" }, ["attribute2"] = { [1] = "val2" } }, -- the sequence of elements occuring in the body of a node. -- the items of content can be strings or nodes themselves ["content"] = { [1] = "text content of element" }, }

For example, node.call allows you to access the name of the markup element. node.content[1] allows you to access the string which is the first and only content member of element in Lua. Remember that in Lua, the first element in a collection type is stored at index 1 (not 0 as in the majority of programming languages).

Now create a Lua file hooks.lua in the same directory (the name must start with hooks and must end with .lua) with the following content:

lua Litua.convert_node_to_string("element", function (node) return "The " .. tostring(node.call) .. " said: " .. tostring(node.content[1]) end)

Now let us invoke litua again:

bash$ litua doc.lit […] bash$ cat doc.out The element said: text content of element

Wow, we just modified the behavior how to process the document 😍

Hooks

In fact, we used a concept called hook to modify the behavior. We register a hook with convert_node_to_string to trigger the hook whenever litua tries to convert a node to a string. A hook is a Lua function. Let us read the Lua syntax:

lua Litua.convert_node_to_string("element", function (node) return "The " .. tostring(node.call) .. " said: " .. tostring(node.content[1]) end)

The complete set of hooks is given here:

Be aware that the document always lives within one invisible top-level node called document. So if you use a document element in your input file and define a hook for the element document as well, don't be surprised about the additional invocation of this hook.

Examples

I highly recommend to go through the examples in this order to get an idea how to use the hooks:

  1. enumeration – replace a call with an incrementing counter
  2. replacements – first define substitution pairs and then apply them
  3. literate-programming – define documentation and code block and write them to different files
  4. markup – serialize the tree to HTML5

Why should I use it?

Litua is a simple text processing utility for text documents with a hierarchical structure. It reminds of tools like XSLT, but people often complain about XSLT being too foreign to common programming languages. As an alternative, I provide litua with a parser for the litua input syntax, a map of data from rust to Lua, a runtime in Lua, and writer for text files.

How to install

This is a single static executable. It only depends on basic system libraries like pthread, math and libc. It ships the entire Lua 5.4 interpreter with the executable. I expect it to work out-of-the-box on your operating system.

How to run

Call the litua executable with -h to get information about additional arguments:

litua -h

Litua input specification

The following document defines the syntax (see also design/litua-lexer-state-diagram.jpg):

``` Node = (Text | RawString | Function){0,…} Text = (NOT the symbols "{" or "}"){1,…} RawString = "{<" Whitespace (NOT the string Whitespace-and-">}") Whitespace ">}" | "{<<" Whitespace (NOT the string Whitespace-and-">>}") Whitespace ">>}" | "{<<<" Whitespace (NOT the string Whitespace-and-">>>}") Whitespace ">>>}" … continue up to 126 "<" characters Function = "{" Call "}" | "{" Call Whitespace "}" | "{" Call Whitespace Node "}" | "{" Call ( "[" Key "=" Node "]" ){1,…} "}" | "{" Call ( "[" Key "=" Node "]" ){1,…} Whitespace "}" | "{" Call ( "[" Key "=" Node "]" ){1,…} Whitespace Node "}"

Call = (NOT the symbols "}", "[" or "<")(NOT the symbols "[" or "<"){0,…} Key = (NOT the symbol "="){1,…} Whitespace = any of the 25 Unicode Whitespace characters ```

In essence, don't use "<" or "[" in function call names, or "=" in argument keys. Keep the number of opening and closing braces balanced (though this is not enforced by the syntax).

Improvements

The following parts can be improved:

Source Code

The source code is available at Github.

License

See the LICENSE file (Hint: MIT license).

Changelog

0.9
first public release with raw strings and four examples
1.0.0
improves stdout/stderr, improved documentation, CI builds, upload to crates.io
1.1.0
bugfix third argument of modify-node hook, modify-hook may now also return strings
1.1.1
bugfix: interrupted '>' sequences inside raw string content can be used again, removed hook checks from testsuite
2.0
improved docs, require whitespace before ">" in raw strings

Issues

Please report any issues on the Github issues page.