A Rust crate with a sscanf (inverse of format!()) Macro based on Regex
sscanf
is originally a C-function that takes a string, a format string with placeholders and
several variables (in the Rust version replaced with types). It then parses the input string,
writing the values behind the placeholders into the variables (Rust: returns a tuple). sscanf
can be thought of as reversing a call to format!()
:
```rust
// format: takes format string and values, returns String
let msg = format!("Hello {}{}!", "World", 5);
assert_eq!(msg, "Hello World5!");
// sscanf: takes string, format string and types, returns tuple let parsed = sscanf::sscanf!(msg, "Hello {}{}!", str, usize);
// parsed is Result<(&str, usize), ...> assert_eq!(parsed.unwrap(), ("World", 5));
// alternative syntax:
let parsed2 = sscanf::sscanf!(msg, "Hello {str}{usize}!");
assert_eq!(parsed2.unwrap(), ("World", 5));
``
sscanf!()takes a format string like
format!(), but doesn't write the values into the
placeholders (
{}), but extracts the values at those
{}` into the return tuple.
If matching the format string failed, an Error is returned:
rust
let msg = "Text that doesn't match the format string";
let parsed = sscanf::sscanf!(msg, "Hello {str}{usize}!");
assert!(matches!(parsed, Err(sscanf::Error::MatchFailed)));
Types in Placeholders:
The types can either be given as a separate parameter after the format string, or directly
inside of the {}
placeholder.
The first allows for autocomplete while typing, syntax highlighting and better compiler errors
generated by sscanf in case that the wrong types are given.
The second imitates the Rust format!() behavior since 1.58.
This option gives worse compiler errors when using stable Rust,
but is otherwise identical to the first option.
More examples of the capabilities of sscanf
:
```rust
use sscanf::sscanf;
use std::num::NonZeroUsize;
let input = " let input = "Move to N36E21";
let parsed = sscanf!(input, "Move to {char}{usize}{char}{usize}");
assert_eq!(parsed.unwrap(), ('N', 36, 'E', 21)); let input = "Escape literal { } as {{ and }}";
let parsed = sscanf!(input, "Escape literal {{ }} as {{{{ and }}}}");
assert_eq!(parsed.unwrap(), ()); let input = "Indexing types: N36E21";
let parsed = sscanf!(input, "Indexing types: {1}{0}{1}{0}", NonZeroUsize, char);
// output is in the order of the placeholders
assert_eq!(parsed.unwrap(), ('N', NonZeroUsize::new(36).unwrap(),
'E', NonZeroUsize::new(21).unwrap())); let input = "A Sentence with Spaces. Another Sentence.";
// str and String do the same, but String clones from the input string
// to take ownership instead of borrowing.
let (a, b) = sscanf!(input, "{String}. {str}.").unwrap();
asserteq!(a, "A Sentence with Spaces");
asserteq!(b, "Another Sentence"); // Number format options
let input = "ab01 127 101010 1Z";
let parsed = sscanf!(input, "{usize:x} {i32:o} {u8:b} {u32:r36}");
let (a, b, c, d) = parsed.unwrap();
asserteq!(a, 0xab01); // Hexadecimal
asserteq!(b, 0o127); // Octal
assert_eq!(c, 0b101010); // Binary asserteq!(d, 71); // any radix (r36 = Radix 36)
asserteq!(d, u32::fromstrradix("1Z", 36).unwrap()); let input = "color: #D4AF37";
// Number types take their size into account, and hexadecimal u8 can
// have at most 2 digits => only possible match is 2 digits each.
let (r, g, b) = sscanf!(input, "color: #{u8:x}{u8:x}{u8:x}").unwrap();
assert_eq!((r, g, b), (0xD4, 0xAF, 0x37));
`` The parsing part of this macro has very few limitations, since it replaces the And so on. The actual implementation for numbers tries to take the size of the type into
account and some other details, but that is the gist of the parsing. This means that any sequence of replacements is possible as long as the Regex finds a
combination that works. In the All options are inside Procedural macro don't have any reliable type info and can only compare types by name. This means
that the number options below only work with a literal type like " | config | description | possible types |
| --------------------------- | -------------------------- | -------------- |
| Custom Regex: For example:
```rust
let input = "random Text";
let parsed = sscanf::sscanf!(input, "{str:/[^m]+/}{str}"); // regex [^m]+ matches anything that isn't an 'm'
// => stops at the 'm' in 'random'
assert_eq!(parsed.unwrap(), ("rando", "m Text"));
``` The regex uses the NOTE: You should use raw strings for a format string containing a regex, since otherwise you
need to escape any Note: If you use any unescaped ( ) in your regex, you have to prevent them from forming a capture
group by adding a This also means that custom regexes cannot be used on custom types that Radix Options: Only work on primitive integer types ( Alternate form: If used alongside a radix option: makes the number require a prefix (0x, 0o, 0b). A note on prefixes: More uses for To add more types there are three options:
- Derive The simplest option is to use struct Color {
r: u8,
g: u8,
b: u8,
} let input = "color: #ff00cc";
let parsed = sscanf::sscanf!(input, "color: {Color}").unwrap();
assert!(matches!(parsed, Color { r: 0xff, g: 0x00, b: 0xcc }));
``` Also works for enums:
```rust enum HasChanged {
#[sscanf(format = "received {added} additions and {deleted} deletions")]
Yes {
added: usize,
deleted: usize,
},
#[sscanf("has not changed")] // the let input = "Your file has not changed since your last visit!";
let parsed = sscanf::sscanf!(input, "Your file {HasChanged} since your last visit!").unwrap();
assert!(matches!(parsed, HasChanged::No)); let input = "Your file received 325 additions and 15 deletions since your last visit!";
let parsed = sscanf::sscanf!(input, "Your file {HasChanged} since your last visit!").unwrap();
assert!(matches!(parsed, HasChanged::Yes { added: 325, deleted: 15 }));
``` More details can be found in the Licensed under either of Apache License, Version 2.0 or
MIT license at your option.
The input in this case is a
&'static str, but it can be
String,
&str,
&String, ...
Basically anything with [
Deref{}
with a
Regular Expression (regex
) that corresponds to that type.
For example:
- char
is just one character (regex "."
)
- str
is any sequence of characters (regex ".+?"
)
- Numbers are any sequence of digits (regex "[-+]?\d+"
)char, usize, char, usize
example above it manages to assign
the N
and E
to the char
s because they cannot be matched by the usize
s.Format Options
'{'
'}'
and after a :
, so either as {<type>:<option>}
or
as {:<option>}
. Note: The type might still have a path that contains ::
. Any double
colons are ignored and only single colons are used to separate the options.i32
", NO Paths (~~std::i32
~~)
or Wrappers (~~struct Wrapper(i32);
~~) or Aliases (~~type Alias = i32;
~~). ONLY i32
,
usize
, u16
, ...{:/
\/}
| custom regex | any |
| {:x}
| hexadecimal numbers | integers |
| {:o}
| octal numbers | integers |
| {:b}
| binary numbers | integers |
| {:r2}
- {:r36}
| radix 2 - radix 36 numbers | integers |
| #
| "alternate" form | various types |
{:/.../}
: Match according to the Regex
between the /
/
same escaping logic as JavaScripts /.../ syntax
,
meaning that the normal regex escaping with \d
for digits etc. is in effect, with the addition
that any /
need to be escaped as \/
since they are used to end the regex.\
as \\
:
rust
use sscanf::sscanf;
let input = "1234";
let parsed = sscanf!(input, r"{u8:/\d{2}/}{u8}"); // regex \d{2} matches 2 digits
let _ = sscanf!(input, "{u8:/\\d{2}/}{u8}"); // the same with a non-raw string
assert_eq!(parsed.unwrap(), (12, 34));
?:
at the beginning: {:/..(..)../}
becomes {:/..(?:..)../}
. This won't
change their functionality in any way, but is necessary for sscanf
's parsing process to work.derive FromScanf
since those rely on having an exact number of capture groups inside of their regex.u8
, ..., u128
, i8
, ..., i128
, usize
, isize
).
- x
: hexadecimal Number (Digits 0-9 and a-f or A-F), optional prefix 0x
or 0X
- o
: octal Number (Digits 0-7), optional prefix 0o
or 0O
- b
: binary Number (Digits 0-1), optional prefix 0b
or 0B
- r2
- r36
: any radix Number (Digits 0-9 and a-z or A-Z for higher radices)r2
, r8
and r16
match the same numbers as b
, o
and x
respectively,
but without a prefix. Thus:
- {:x}
may have a prefix, matching numbers like 0xab
or ab
- {:r16}
has no prefix and would only match ab
- {:#x}
must have a prefix, matching only 0xab
- {:#r16}
gives a compile error#
may be added in the future. Let me know if you have a suggestion for this.Custom Types
sscanf
works with most primitive Types from std
as well as String
by default. The
full list can be seen here: Implementations of RegexRepresentation
.FromScanf
for your type (recommended)
- Implement both RegexRepresentation
and std::str::FromStr
for your type
- Implement RegexRepresentation
and manually implement FromScanf
for your type (highly discouraged)derive
:
```rust[derive(sscanf::FromScanf)]
[sscanf(format = "#{r:x}{g:x}{b:x}")] // matches '#' followed by 3 hexadecimal u8s
[derive(sscanf::FromScanf)]
format =
part can be omitted
No
}FromScanf
documentation
and the derive
documentationLicense