The goal of this crate is simple: give everybody the power of regular expressions without having to learn the complicated syntax. It is inspired by ReadableRegex.jl. This crate is a wrapper around the core Rust regex library.
If you want to match a date of the format 2021-10-30
, you could use the following code to generate a regex:
rust
use human_regex::{beginning, digit, exactly, text, end};
let regex_string = beginning()
+ exactly(4, digit())
+ text("-")
+ exactly(2, digit())
+ text("-")
+ exactly(2, digit())
+ end();
assert!(regex_string.to_regex().is_match("2014-01-01"));
The to_regex()
method returns a standard Rust regex. We can do this another way with slightly less repetition though!
rust
use human_regex::{beginning, digit, exactly, text, end};
let first_regex_string = text("-") + exactly(2, digit());
let second_regex_string = beginning()
+ exactly(4, digit())
+ exactly(2, first_regex_string)
+ end();
assert!(second_regex_string.to_regex().is_match("2014-01-01"));
For a more extensive set of examples, please see The Cookbook.
This crate currently supports the vast majority of syntax available in the core Rust regex library through a human-readable API.
| Implemented? | Expression | Description |
|:-------------------------------------------:|:-------------------:|:--------------------------------------------------------------|
| any()
| .
| any character except new line (includes new line with s flag) |
| digit()
| \d
| digit (\p{Nd}
) |
| non_digit()
| \D
| not digit |
| unicode_category(UnicodeCategory)
| \p{L}
| Unicode non-script category |
| unicode_script(UnicodeScript)
| \p{Greek}
| Unicode script category |
| non_unicode_category(UnicodeCategory)
| \P{L}
| Negated one-letter name Unicode character class |
| non_unicode_script(UnicodeCategory)
| \P{Greek}
| negated Unicode character class (general category or script) |
| Implemented? | Expression | Description |
|:---------------------------:|:--------------:|:------------------------------------------------------------------------------------|
| or(&['x', 'y', 'z'])
| [xyz]
| A character class matching either x, y or z (union). |
| nor(&['x', 'y', 'z'])
| [^xyz]
| A character class matching any character except x, y and z. |
|within('a'..='z')
| [a-z]
| A character class matching any character in range a-z. |
|without('a'..='z')
| [^a-z]
| A character class matching any character outside range a-z. |
| See below | [[:alpha:]]
| ASCII character class ([A-Za-z]
) |
| non_alphanumeric()
| [[:^alpha:]]
| Negated ASCII character class ([^A-Za-z]
) |
| or()
| [x[^xyz]]
| Nested/grouping character class (matching any character except y and z) |
| and(&[])
/&
| [a-y&&xyz]
| Intersection (a-y AND xyz = xy) |
| (or[1,2,3,4] & nor(3))
| [0-9&&[^4]]
| Subtraction using intersection and negation (matching 0-9 except 4) |
| subtract(&[],&[])
| [0-9--4]
| Direct subtraction (matching 0-9 except 4). Use .collect::
| xor(&[],&[])
| [a-g~~b-h]
| Symmetric difference (matching a
and h
only). Requires .collect() for ranges. |
|or(&escape_all(&['[',']']))
| [\[\]]
| Escaping in character classes (matching [
or ]
) |
| Implemented? | Expression | Description |
|:------------------:| :--------: |:---------------------------------------------------------------------------|
| digit()
| \d
| digit (\p{Nd}
) |
| non_digit()
| \D
| not digit |
| whitespace()
| \s
| whitespace (\p{White_Space}
) |
| non_whitespace()
| \S
| not whitespace |
| word()
| \w
| word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}
) |
| non_word()
| \W
| not word character |
| Implemented? | Expression | Description |
|:----------------:|:--------------:|:----------------------------------|
| alphanumeric()
| [[:alnum:]]
| alphanumeric ([0-9A-Za-z]
) |
| alphabetic()
| [[:alpha:]]
| alphabetic ([A-Za-z]
) |
| ascii()
| [[:ascii:]]
| ASCII ([\x00-\x7F]
) |
| blank()
| [[:blank:]]
| blank ([\t ]
) |
| control()
| [[:cntrl:]]
| control ([\x00-\x1F\x7F]
) |
| digit()
| [[:digit:]]
| digits ([0-9]
) |
| graphical()
| [[:graph:]]
| graphical ([!-~]
) |
| uppercase()
| [[:lower:]]
| lower case ([a-z]
) |
| printable()
| [[:print:]]
| printable ([ -~]
) |
| punctuation()
| [[:punct:]]
| punctuation ([!-/:-@\[-`{-~]
) |
| whitespace()
| [[:space:]]
| whitespace ([\t\n\v\f\r ]
) |
| lowercase()
| [[:upper:]]
| upper case ([A-Z]
) |
| word()
| [[:word:]]
| word characters ([0-9A-Za-z_]
) |
| hexdigit()
| [[:xdigit:]]
| hex digit ([0-9A-Fa-f]
) |
| Implemented? | Expression | Description |
|:-------------------------:|:----------:|:---------------------------------------------|
| zero_or_more(x)
| x*
| zero or more of x (greedy) |
| one_or_more(x)
| x+
| one or more of x (greedy) |
| zero_or_one(x)
| x?
| zero or one of x (greedy) |
| zero_or_more(x)
| x*?
| zero or more of x (ungreedy/lazy) |
| one_or_more(x).lazy()
| x+?
| one or more of x (ungreedy/lazy) |
| zero_or_more(x).lazy()
| x??
| zero or one of x (ungreedy/lazy) |
| between(n, m, x)
| x{n,m}
| at least n x and at most m x (greedy) |
| at_least(n, x)
| x{n,}
| at least n x (greedy) |
| exactly(n, x)
| x{n}
| exactly n x |
| between(n, m, x).lazy()
| x{n,m}?
| at least n x and at most m x (ungreedy/lazy) |
| at_least(n, x).lazy()
| x{n,}?
| at least n x (ungreedy/lazy) |
| Implemented? | Expression | Description |
|:------------:|:----------:|:--------------------------------|
| +
| xy
| concatenation (x followed by y) |
| or()
| x\|y
| alternation (x or y, prefer x) |
| Implemented? | Expression | Description |
|:---------------------:|:----------:|:--------------------------------------------------------------------|
| beginning()
| ^
| the beginning of text (or start-of-line with multi-line mode) |
| end()
| $
| the end of text (or end-of-line with multi-line mode) |
| beginning_of_text()
| \A
| only the beginning of text (even with multi-line mode enabled) |
| end_of_text()
| \z
| only the end of text (even with multi-line mode enabled) |
| word_boundary()
| \b
| a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
| non_word_boundary()
| \B
| not a Unicode word boundary |
| Implemented? | Expression | Description |
|:-------------------------------------------------:|:---------------:|:--------------------------------------------------------|
| capture(exp)
| (exp)
| numbered capture group (indexed by opening parenthesis) |
| named_capture(exp, name)
| (?P<name>exp)
| named (also numbered) capture group |
| Handled implicitly through functional composition | (?:exp)
| non-capturing group |
| See below | (?flags)
| set flags within current group |
| See below | (?flags:exp)
| set flags for exp (non-capturing) |
| Implemented? | Expression | Description |
|:-----------------------------------:|:----------:|:--------------------------------------------------------------|
| case_insensitive(exp)
| i
| case-insensitive: letters match both upper and lower case |
| multi_line_mode(exp)
| m
| multi-line mode: ^
and $
match begin/end of line |
| dot_matches_newline_too(exp)
| s
| allow .
to match \n
|
| will not be implemented1 | U
| swap the meaning of x*
and x*?
|
| disable_unicode(exp)
| u
| Unicode support (enabled by default) |
| will not be implemented2 | x
| ignore whitespace and allow line comments (starting with #
) |
human_regex
, comments should be added in source code rather than in the regex string.