Rust Tree-sitter

Crates.io

Rust bindings to the Tree-sitter parsing library.

Basic Usage

First, create a parser:

```rust use tree_sitter::{Parser, Language};

let mut parser = Parser::new(); ```

Tree-sitter languages consist of generated C code. To make sure they're properly compiled and linked, you can create a build script like the following (assuming tree-sitter-javascript is in your root directory):

```rust use std::path::PathBuf;

fn main() { let dir: PathBuf = ["tree-sitter-javascript", "src"].iter().collect();

cc::Build::new()
    .include(&dir)
    .file(dir.join("parser.c"))
    .file(dir.join("scanner.c"))
    .compile("tree-sitter-javascript");

} ```

Add the cc crate to your Cargo.toml under [build-dependencies]:

toml [build-dependencies] cc="*"

To then use languages from rust, you must declare them as extern "C" functions and invoke them with unsafe. Then you can assign them to the parser.

```rust extern "C" { fn treesitterc() -> Language; } extern "C" { fn treesitterrust() -> Language; } extern "C" { fn treesitterjavascript() -> Language; }

let language = unsafe { treesitterrust() }; parser.set_language(language).unwrap(); ```

Now you can parse source code:

```rust let sourcecode = "fn test() {}"; let tree = parser.parse(sourcecode, None).unwrap(); let rootnode = tree.rootnode();

asserteq!(rootnode.kind(), "sourcefile"); asserteq!(rootnode.startposition().column, 0); asserteq!(rootnode.end_position().column, 12); ```

Editing

Once you have a syntax tree, you can update it when your source code changes. Passing in the previous edited tree makes parse run much more quickly:

```rust let newsourcecode = "fn test(a: u32) {}"

tree.edit(InputEdit { startbyte: 8, oldendbyte: 8, newendbyte: 14, startposition: Point::new(0, 8), oldendposition: Point::new(0, 8), newendposition: Point::new(0, 14), });

let newtree = parser.parse(newsource_code, Some(&tree)); ```

Text Input

The source code to parse can be provided either as a string, a slice, a vector, or as a function that returns a slice. The text can be encoded as either UTF8 or UTF16:

```rust // Store some source code in an array of lines. let lines = &[ "pub fn foo() {", " 1", "}", ];

// Parse the source code using a custom callback. The callback is called // with both a byte offset and a row/column offset. let tree = parser.parsewith(&mut |byte: u32, position: Point| -> &[u8] { let row = position.row as usize; let column = position.column as usize; if row < lines.len() { if column < lines[row].asbytes().len() { &lines[row].asbytes()[column..] } else { "\n".as_bytes() } } else { &[] } }, None).unwrap();

asserteq!( tree.rootnode().tosexp(), "(sourcefile (functionitem (visibilitymodifier) (identifier) (parameters) (block (number_literal))))" ); ```