| ⚠️ This package is under active development which will include breaking changes. ⚠️ | | :--------------------------------------------------------------------------------: |

Regex for Humans

The goal of this crate is simple: give everybody the power of regular expressions without having to learn the complicated syntax. It is inspired by ReadableRegex.jl. This crate is a wrapper around the core Rust regex library.

Example usage

Matching a date

If you want to match a date of the format 2021-10-30, you would use the following code to generate a regex: ```rust use human_regex::{begin, digit, end, exactly, text};

fn main() { let regexstring = begin() + exactly(4, digit()) + text("-") + exactly(2, digit()) + text("-") + exactly(2, digit()) + end(); println!("{}", regexstring.toregex().ismatch("2014-01-01")) } ```

Roadmap

The eventual goal of this crate is to support all the syntax in the core Rust regex library through a human-readable API. Here is where we currently stand:

Character Classes

Single Character

| Implemented? | Expression | Description | | :----------: | :--------: | :------------------------------------------------------------ | | any() | . | any character except new line (includes new line with s flag) | | digit() | \d | digit (\p{Nd}) | | non_digit() | \D | not digit | | |\pN | One-letter name Unicode character class | | |\p{Greek} | Unicode character class (general category or script) | | |\PN | Negated one-letter name Unicode character class | | |\P{Greek} | negated Unicode character class (general category or script) |

Perl Character Classes

| Implemented? | Expression | Description | | :---------------: | :--------: | :----------------------------------------------------------------------- | | digit() | \d | digit (\p{Nd}) | | non_digit() | \D | not digit | | whitespace() | \s | whitespace (\p{WhiteSpace}) | | non_whitespace() | \S | not whitespace | | word() | \w | word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{JoinControl}) | | non_word() | \W | not word character |

ASCII Character Classes

| Implemented? | Expression | Description | | :---------------: | :------------: | :----------------------------- | | | [[:alnum:]] | alphanumeric ([0-9A-Za-z]) | | | [[:alpha:]] | alphabetic ([A-Za-z]) | | | [[:ascii:]] | ASCII ([\x00-\x7F]) | | | [[:blank:]] | blank ([\t ]) | | | [[:cntrl:]] | control ([\x00-\x1F\x7F]) | | digit() | [[:digit:]] | digits ([0-9]) | | | [[:graph:]] | graphical ([!-~]) | | | [[:lower:]] | lower case ([a-z]) | | | [[:print:]] | printable ([ -~]) | | | [[:punct:]] | punctuation ([!-/:-@[-{-~]) | | |[[:space:]]| whitespace ([\t\n\v\f\r ]) | | |[[:upper:]]| upper case ([A-Z]) | |word()|[[:word:]]| word characters ([0-9A-Za-z_]) | | |[[:xdigit:]]` | hex digit ([0-9A-Fa-f]) |

Repetitions

| Implemented? | Expression | Description | | :-----------------------: | :------------: | :------------------------------------------- | | zero_or_more(x) | x* | zero or more of x (greedy) | | one_or_more(x) | x+ | one or more of x (greedy) | | zero_or_one(x) | x? | zero or one of x (greedy) | | zero_or_more(x) | x*? | zero or more of x (ungreedy/lazy) | | one_or_more(x).lazy() | x+? | one or more of x (ungreedy/lazy) | | zero_or_more(x).lazy() | x?? | zero or one of x (ungreedy/lazy) | | between(n, m, x) | x{n,m} | at least n x and at most m x (greedy) | | at_least(n, x) | x{n,} | at least n x (greedy) | | exactly(n, x) | x{n} | exactly n x | | between(n, m, x).lazy() | x{n,m}? | at least n x and at most m x (ungreedy/lazy) | | at_least(n, x).lazy() | x{n,}? | at least n x (ungreedy/lazy) |

Composites

| Implemented? | Expression | Description | | :---------------: | :------------: | :------------------------------ | | + | xy | concatenation (x followed by y) | | or() | x\|y | alternation (x or y, prefer x) |

Empty matches

| Implemented? | Expression | Description | | :------------------: | :------------: | :------------------------------------------------------------------ | | begin() | ^ | the beginning of text (or start-of-line with multi-line mode) | | end() | $ | the end of text (or end-of-line with multi-line mode) | | | \A | only the beginning of text (even with multi-line mode enabled) | | | \z | only the end of text (even with multi-line mode enabled) | | word_boundary() | \b | a Unicode word boundary (\w on one side and \W, \A, or \z on other) | | non_word_boundary() | \B | not a Unicode word boundary |

Groupings and Flags

| Implemented? | Expression | Description | | :---------------: | :-------------: | :------------------------------------------------------ | | | (exp) | numbered capture group (indexed by opening parenthesis) | | | (?P<name>exp) | named (also numbered) capture group | | Handled implicitly through functional composition | (?:exp) | non-capturing group | | | (?flags) | set flags within current group | | | (?flags:exp) | set flags for exp (non-capturing) |

| Implemented? | Expression | Description | | :---------------: | :------------: | :------------------------------------------------------------ | | | i | case-insensitive: letters match both upper and lower case | | | m | multi-line mode: ^ and $ match begin/end of line | | | s | allow . to match \n | | | U | swap the meaning of x* and x*? | | | u | Unicode support (enabled by default) | | | x | ignore whitespace and allow line comments (starting with #) |