Demo: demo
Rust document: docs.rs
Python document: python/README.md
Blog post: How to calculate the alignment between BERT and spaCy tokens effectively and robustly
Get an alignment map for two different and noisy tokenizations:
```python
tokensa = ["げん", "ご"] tokensb = ["けんこ"] # all accents are dropped (が -> か, ご -> こ) a2b, b2a = tokenizations.getalignments(tokensa, tokens_b) print(a2b) [[0], [0]] print(b2a) [[0, 1]] ```
a2b[i]
is tokens_a list representing the alignment from tokens_a
to tokens_b
.