A crate for creating and manipulating deterministic finite automata (DFAs). Currently, the implementation is somewhat biased towards building DFAs from regular expressions.
Some regular expression implementations (e.g. rust's regex library) are based on non-deterministic finite automata (NFAs). By turning NFAs into DFAs, we can sometimes get a speed boost, at the cost of some compilation time. Preliminary benchmarks show that we can get a 5x improvement over rust's built-in regular expressions for very simple regexes.
For a favorable example, compare the results of regex_dfa::Dfa
:
test bench::anchored_literal_long_match ... bench: 51 ns/iter (+/- 2)
test bench::anchored_literal_long_non_match ... bench: 31 ns/iter (+/- 1)
test bench::anchored_literal_short_match ... bench: 51 ns/iter (+/- 1)
test bench::anchored_literal_short_non_match ... bench: 31 ns/iter (+/- 1)
to the same results for regex::Regex
:
test bench::anchored_literal_long_match ... bench: 353 ns/iter (+/- 7)
test bench::anchored_literal_long_non_match ... bench: 230 ns/iter (+/- 13)
test bench::anchored_literal_short_match ... bench: 329 ns/iter (+/- 24)
test bench::anchored_literal_short_non_match ... bench: 206 ns/iter (+/- 9)
There are also less favorable examples, however: bench::medium_1K
has a throughput of
271 MB/s with regex_dfa
, but 325 MB/s with regex
.
Some regex features don't map well to DFAs. Specifically, this crate does not support (nor does it plan to support) lazy repetition or subgroup captures.
regex_dfa
is not universally faster than the implementation in the regex
crate. For example, the character matching code is faster in regex
than
regex_dfa
(see bench::match_class_unicode
) for an example. Moreover,
regex
has a much more sophisticated optimization for regexes that begin with
one of a small set of strings (e.g. the regex-dna benchmark in the benchmark
shootout game).
regex_dfa
currently only works on nightly rust.
regex
crate. We should split off a better implementation into a
separate crate.regex-dfa
to find matches and then running regex
on those
matches in order to extract groups).regex_dfa
is distributed under the MIT license and the Apache license (version 2.0).
See LICENSE-APACHE and LICENSE-MIT for details.