syntect

Crates.io Documentation Crates.io Build Status codecov

syntect is a syntax highlighting library for Rust that uses Sublime Text syntax definitions. It aims to be a good solution for any Rust project that needs syntax highlighting, including deep integration with text editors written in Rust. It's used in production by at least two companies, and by many open source projects.

If you are writing a text editor (or something else needing highlighting) in Rust and this library doesn't fit your needs, I consider that a bug and you should file an issue or email me.

Note: I consider this project "done" in the sense that it works quite well for its intended purpose, accomplishes the major goals I had, and I'm unlikely to make any sweeping changes. I won't be committing much anymore because the marginal return on additional work isn't very high. Rest assured if you submit PRs I will review them and likely merge promptly. I'll also quite possibly still fix issues and definitely offer advice and knowledge on how the library works. Basically I'll be maintaining the library but not developing it further. I've spent months working on, tweaking, optimizing, documenting and testing this library. If you still have any reasons you don't think it fits your needs, file an issue or email me.

Rendered docs: https://docs.rs/syntect

Getting Started

syntect is available on crates.io. You can install it by adding this line to your Cargo.toml:

toml syntect = "1.8"

After that take a look at the documentation and the examples.

Note: with stable Rust on Linux there is a possibility you might have to add ./target/debug/build/onig_sys-*/out/lib/ to your LD_LIBRARY_PATH environment variable. I dunno why or even if this happens on other places than Travis, but see travis.yml for what it does to make it work. Do this if you see libonig.so: cannot open shared object file.

If you've cloned this repository, be sure to run

git submodule update --init

to fetch all the required dependencies for running the tests.

Feature Flags

Syntect makes heavy use of cargo features, to support users who require only a subset of functionality. In particular, it is possible to use the highlighting component of syntect without the parser (for instance when hand-rolling a higher performance parser for a particular language), by adding default-features = false to the syntect entry in your Cargo.toml.

For more information on available features, see the features section in Cargo.toml.

Features/Goals

Screenshots

There's currently an example program called syncat that prints one of the source files using hard-coded themes and syntaxes using 24-bit terminal escape sequences supported by many newer terminals. These screenshots don't look as good as they could for two reasons: first the sRGB colours aren't corrected properly, and second the Rust syntax definition uses some fancy labels that these themes don't have highlighting for.

Nested languages Base 16 Ocean Dark Solarized Light InspiredGithub

Roadmap

Performance

Currently syntect is one of the faster syntax highlighting engines, but not the fastest. The following perf features are done and to-be-done:

The current perf numbers are below. These numbers may get better if more of the things above are implemented, but they're better than many other text editors. All measurements were taken on a mid 2012 15" retina Macbook Pro.

Caching

Because syntect's API exposes internal cacheable data structures, there is a caching strategy that text editors can use that allows the text on screen to be re-rendered instantaneously regardless of the file size when a change is made after the initial highlight.

Basically, on the initial parse every 1000 lines or so copy the parse state into a side-buffer for that line. When a change is made to the text, because of the way Sublime Text grammars work (and languages in general), only the highlighting after that change can be affected. Thus when a change is made to the text, search backwards in the parse state cache for the last state before the edit, then kick off a background task to start re-highlighting from there. Once the background task highlights past the end of the current editor viewport, render the new changes and continue re-highlighting the rest of the file in the background.

This way from the time the edit happens to the time the new colouring gets rendered in the worst case only 999+length of viewport lines must be re-highlighted. Given the speed of syntect even with a long file and the most complicated syntax and theme this should take less than 100ms. This is enough to re-highlight on every key-stroke of the world's fastest typist in the worst possible case. And you can reduce this asymptotically to the length of the viewport by caching parse states more often, at the cost of more memory.

Any time the file is changed the latest cached state is found, the cache is cleared after that point, and a background job is started. Any already running jobs are stopped because they would be working on old state. This way you can just have one thread dedicated to highlighting that is always doing the most up-to-date work, or sleeping.

Parallelizing

syntect doesn't provide any built-in facilities to enable highlighting in parallel. Some of the important data structures are not thread-safe, either, most notably SyntaxSet. However, if you find yourself in need of highlighting lots of files in parallel, the recommendation is to use some sort of thread pooling, along with the thread_local! macro from libstd, so that each thread that needs, say, a SyntaxSet, will have one, while minimizing the amount of them that need to be initialized. For adding parallelism to a previously single-threaded program, the recommended thread pooling is rayon. However, if you're working in an already-threaded context where there might be more threads than you want (such as writing a handler for an Iron request), the recommendation is to force all highlighting to be done within a fixed-size thread pool using scoped-thread-pool. An example of the former is in examples/parsyncat.rs.

See #20 and #78 for more detail and discussion about why syntect doesn't provide parallelism by default.

Examples Available

There's a number of examples of programs that use syntect in the examples folder and some code outside the repo:

Here's that stats that synstats extracts from syntect's codebase (not including examples and test data) as of this commit: ```

############ Stats

File count: 19 Total characters: 155504

Function count: 165 Type count (structs, enums, classes): 64

Code lines (traditional SLOC): 2960 Total lines (w/ comments & blanks): 4011 Comment lines (comment but no code): 736 Blank lines (lines-blank-comment): 315

Lines with a documentation comment: 646 Total words written in doc comments: 4734 Total words written in all comments: 5145 Characters of comment: 41099 ```

Projects using Syntect

Below is a list of projects using Syntect, in approximate order by how long they've been using syntect (feel free to send PRs to add to this list):

License and Acknowledgements

Thanks to Textmate 2 and @defuz's sublimate for the existing open source code I used as inspiration and in the case of sublimate's tmTheme loader, copy-pasted. All code (including defuz's sublimate code) is released under the MIT license.