STAM Tools

A collection of command-line tools for working with STAM.

Various tools are grouped under the stam tool, and invoked with a subcommand:

For many of these, you can set --verbose for extra details in the output.

Installation

From source

$ cargo install stam-tools

Usage

Add the --help flag after the subcommand for extensive usage instructions.

Most tools take as input a STAM JSON file containing an annotation store. Any files mentioned via the @include mechanism are loaded automatically.

Instead of passing STAM JSON files, you can read from stdin and/or output to stdout by setting the filename to -, this works in many places.

These tools also support reading and writing STAM CSV.

Tools

stam tag

The stam tag tool can be used for matching regular expressions in text and subsequently associated annotations with the found results. It is a tool to do for example tokenization or other tagging tasks.

The stam tag command takes a TSV file (example) containing regular expression rules for the tagger. The file contains the following columns:

  1. The regular expressions follow the this syntax. The expression may contain one or or more capture groups containing the items that will be tagged, in that case anything else is considered context and will not be tagged.
  2. The ID of annotation data set
  3. The ID of the data key
  4. The value to set. If this follows the syntax $1,$2,etc.. it will assign the value of that capture group (1-indexed).

Example:

```tsv

EXPRESSION #ANNOTATIONSET #DATAKEY #DATAVALUE

\w+(?:[-_]\w+)* simpletokens type word [.\?,/]+ simpletokens type punctuation [0-9]+(?:[,.][0-9]+) simpletokens type number ```