token-trekker-rs
is a command-line tool for counting the total number of tokens in all files within a directory or matching a glob pattern, using various tokenizers.
To install token-trekker-rs
from crates.io, run:
sh
cargo install token-trekker-rs
To build token-trekker-rs from the source code, first clone the repository:
sh
git clone https://github.com/1rgs/token-trekker-rs.git
cd token-trekker-rs
Then build the project using cargo:
sh
cargo build --release
The compiled binary will be available at ./target/release/token-trekker.
To count tokens in a directory or for files matching a glob pattern, run the following command:
sh
token-trekker --path <path_or_glob_pattern> --tokenizer <tokenizer>
Replace
For example:
sh
token-trekker --path "path/to/files/*.txt" --tokenizer p50k-base