regex_dfa

A crate for compiling regular expressions down to deterministic finite automata.

Build status Coverage Status

Documentation

Why?

Some regular expression implementations (e.g. rust's regex library) are based on simulating non-deterministic finite automata (NFAs). By turning NFAs into DFAs, we can get a speed boost, at the cost of some compilation time. Preliminary benchmarks show a substantial speed improvement over rust's default regex crate.

Limitations

Some regex features don't map well to DFAs. Specifically, this crate does not support (nor does it plan to support) lazy repetition or subgroup captures. But even if those features are needed, a DFA can be used to accelerate a more fully-featured regex implementation: run a DFA to find the beginning of a match, and then run the full engine to either refine the match or extract the capture groups. My experimental fork of the regex crate takes this approach.

regex_dfa currently only works on nightly rust.

Status/TODO

License

regex_dfa is distributed under the MIT license and the Apache license (version 2.0). See LICENSE-APACHE and LICENSE-MIT for details.