toml
[dependencies]
lexpr = "0.2.1"
You may be looking for:
S-expressions are the
human-readable, textual representation of code and data in the Lisp
family of languages. lexpr
aims to provide the tools to:
Embed S-expression data into Rust programs using the sexp
macro:
```rust use lexpr::sexp;
let address = sexp!(((name . "Jane Doe") (street . "4026 Poe Lane"))); ```
Construct and destructure S-expression data using a full-featured API:
```rust use lexpr::Value;
let names = Value::list(vec!["Alice", "Bob", "Mallory"]); println!("The bad guy is {}", names[2].as_str().unwrap()); ```
Parse and serialize S-expression data from and to its textual representation.
To get a better idea of the direction lexpr
is headed, you may want
to take at the TODO or the "why"
document.
Currently, lexpr
focuses on Scheme, mostly based on R6RS and R7RS
syntax, with some extensions, and Emacs Lisp. The following features,
common across dialects, are not yet implemented:
lexpr
is for data exchange between Lisp and Rust programs.quote
, quasiquote
, unquote
and
unquote-splicing
. Again, these are not usually important when
using S-expressions as a data exchange format.Further dialect-specific omissions, both ones that are planned to be fixed in the future, and deliberate ones, are listed below.
#!fold-case
and #!no-fold-case
are not
implemented. It's not clear if these will be implemented at all.Strings in Emacs Lisp are somewhat difficult to deal with, for the following reasons:
They can be either "unibyte" strings, which correspond to byte vectors in Scheme, and "multibyte" strings, which can handle unicode. Whether a string is considered unibyte or multibyte depends on its contents; see Section 2.3.8.2, "Non-ASCII Characters in Strings" in the Emacs Lisp manual for details.
Whether a string is considered unibyte or multibyte not only depends on its contents, but also the source it is read from.
A multibyte string can include characters outside of the unicode codepoint range. This happens for instance when the string includes a hexadecimal or octal escape interpreted as a single byte, potentially violating the encoding rules of the multibyte source.
Emacs Lisp string syntax supports a multitude of escaping modes, some of which originate from representing keyboard event sequences in strings. Using these "keyboard-oriented" escapes inside strings is explicitly discouraged in the Emacs Lisp manual.
The way lexpr
deals with this complexity is the following:
The input source is always considered to be "multibyte" using the UTF-8 encoding; other encodings are not supported.
Mixing non-ASCII UTF-8 characters, either directly part of the input
or represented using escape sequences, and hexadecimal or octal
escape sequences resulting in a single byte outside of the ASCII
range will result in a parse error. For instance, the following
string cannot be parsed by lexpr
:
"\xFC\N{U+203D}"
Emacs, however, would parse this as a string containing the
"character" sequence #x3ffffc
, #x203d
. Note that the first
"character" is not a valid unicode codepoint.
Strings containing only ASCII characters and at least one single-byte hexadecimal or octal escape will be parsed as byte vectors instead of strings. This mirrors the Emacs Lisp rules for when a string will be considered to be "unibyte".
When producing S-expression text, byte vectors will always be represented as a sequence of octal-escaped bytes.
The escaping styles supported by lexpr
are:
\xN...
) and octal (\N...
)\uNNNN
, \U00NNNNNN
)\N{U+X...}
). Note that the syntax that refers to
codepoints using their full name (e.g. \N{LATIN SMALL LETTER A
WITH GRAVE}
) is deliberately not supported.It is expected that these restrictions will not be an impediment when using S-expressions as a data exchange format between Emacs Lisp and Rust programs. In short, S-expressions produced by Rust should be always be parsable by Emacs, and the other direction should work as long as there are no strings with non-unicode "characters" are involved.
The code and documentation in the lexpr
crate is free
software, dual-licensed
under the MIT or Apache-2.0
license, at your choosing.
The lexpr
repository contains code and documentation adapted from
the following projects:
serde_json
, also dual-licensed
under MIT/Apache-2.0 licenses.sexpr
, Copyright 2017 Zephyr
Pellerin, dual-licensed under the same licenses.