= Identation tokenizer
A small an simple indentation tokenizer
== TODO
== Indentation format
Tabs are no valid on indentation grouping.
Let's see by example.
..... ... .... .... .... .... .... .... ....
Indentation groups can have any number of spaces
..... level0 .... level1 <-- .... level2 .... level1 <-- .... level1 <-- .... level0 .... level0
It's not a good idea to have same level with different spaces, but it's allowed when you are creating a new level.
In this example, last level1 is idented with more spaces than previus ones
..... ... .... .... .... <-- incorrect indentation .... <-- correct previous ident level .... .... ....
In order to go back a level, the indentation has to match with the previous on this level.
As we saw in previous example, increasing level is free indentation.
|..... |..... |......
You can start lines with |
, but it's optional.
..... |..... ...... ......
Look that |
is one position previous to indentation level.
It is usefull when you need to start with spaces.
..... | ..... <- This line starts with an space | ...... <- Starting with 2 spaces |..... <- starts with no spaces ..... <- starting with no spaces
|
.....
||..... This line starts with a |
.
A line is empty when there are no content or it only has spaces.
..... ..... ..... ..... ..... next line is empty
..... next line is empty
..... ..... next line is empty
What if I want represent empty lines?
..... ..... ..... ..... ..... I want new line after this line |
..... and three new lines, please
| | |
What if I want to represent spaces at end of line?
Spaces at end of line will not be erased, therefore, you don't need to do anything about it.
But could be intesting to represent it because some editors can run trailing or just because you can visualize it.
..... ..... ..... ..... This line keeps 2 spaces and end | and you know it
Next line is properly indented and only has spaces
| |
In fact, you can write |
at end of all lines. It will be removed.
Next strings, are equivalent.
|
it's optional at end of line.....| .....| .....| .....|
..... ..... ..... .....
But I could need a pipe |
at end of line
..... ..... ..... ..... This line ends with a pipe||
|..... ..... <- Invalid, remember, indentation mark | is previus to real indentation
|..... ..... <- This is OK, but not elegant
| .... <- I want to start with an space |..... <- This is redundant, but more clear
== Tokens
This is the first token This is another token, because it's on a different level And another token This is also a different token
A token can contain multiple lines This is another token with three lines
Empty lines can be used to separate tokens This is a token, that continues here. Next empty line define a token division
And this is a different one
== Identation tokenizer API
Function to call::
Token type::
pub struct Token {
pub lines: Vec
Error type::
pub struct Error { pub line: u32, pub desc: String,
Thats all
Look into lib.rs
== Examples
You can look into lib.rs, there are several tests.
.Complex example
let tokens = tokenize("
0 || 01a 01b 01c
02a
02b
|020a
||020b
| 021a
|021b
1a 1b 11a ||11b 11c
12a ||
|12b || 2a 21a 21b | |
The result will be
vec![Token {
lines: vec!["0".to_owned()],
tokens: vec![Token {
lines: vec!["| 01a".to_owned(),
"01b".to_owned(),
"01c".to_owned()],
tokens: vec![],
},
Token {
lines: vec!["02a".to_owned(), "02b".to_owned()],
tokens: vec![Token {
lines: vec!["020a".to_owned(),
"|020b".to_owned()],
tokens: vec![],
},
Token {
lines: vec![" 021a".to_owned(),
"021b".to_owned()],
tokens: vec![],
}],
}],
},
Token {
lines: vec!["1a".to_owned(), "1b".to_owned()],
tokens: vec![Token {
lines: vec!["11a".to_owned(),
"|11b".to_owned(),
"11c".to_owned()],
tokens: vec![],
},
Token {
lines: vec!["12a |".to_owned(), "12b |".to_owned()],
tokens: vec![],
}],
},
Token {
lines: vec!["2a".to_owned()],
tokens: vec![Token {
lines: vec!["21a".to_owned(),
"21b".to_owned(),
"".to_owned(),
"".to_owned()],
tokens: vec![],
}],