Token samplers for large language models, written in Rust!
Extremely early in development, poorly tested. You can look at src/tests.rs
for some examples of use.
Also a fairly simple example of using Mirostat with my RWKV project here: https://github.com/KerfuffleV2/smolrsrwkv/blob/60b8e8bfe64f157f1800445128af3b4adbbc64c1/smolrwkv-cli/src/main.rs#L139-L164
Using the term "sampler" here loosely, perhaps it should be renamed in the future. Right now a "sampler" could be something that manipulates the list of logits (for example, a top-k sampler might prune the list to the top K entries), it might actually pick a token or both!
Real descriptions may (or may not happen) eventually. For now, you can check out the llama.cpp main
example README for a brief overview of some of the types of sampler: https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md#generation-flags
You probably won't usually want to use individual Sampler
s. The most typical
use case is going to be chaining a number of samplers together.
A simple example of constructing a [SamplerChain]:
```rust use anyhow::Result;
use llm_samplers::prelude::*;
pub fn test_chain1() -> Result<()> {
let mut logits = Logits::try_from_iter([0.1f32, 0.2, 0.3, 0.4].into_iter())?;
// Demonstrating the different ways you can build a SamplerChain.
// These are all equivalent.
let mut sc = SamplerChain::new()
+ SampleFlatBias::new(&[(3, f32::NEG_INFINITY)]);
sc += SampleTemperature::new(0.8);
sc.push_sampler(SampleGreedy::new());
assert_eq!(
sc.sample_token(
// These samplers don't actually need any resources.
&mut NilSamplerResources::default(),
&mut logits)?,
Some(1)
);
// () also implements HasSamplerResources<TokenId = u32>
// so you could use &mut () here.
assert_eq!(sc.sample_token(&mut (), &mut logits)?, Some(1));
Ok(())
} ```
The previous example is simple but not very realistic: the greedy sampler doesn't even care about temperature. Now let's look at something a bit more complicated:
```rust use anyhow::Result; use rand::{SeedableRng, rngs::StdRng};
use llm_samplers::prelude::*;
fn testchain2() -> Result<()> { let examplelogits = vec![0.1f32, 0.2, 0.3, 0.4]; let mut res = SimpleSamplerResources::new( // Optionally include an RNG resource. Some(Box::new(StdRng::seedfromu64(123))), // Optionally include a last tokens resource. Some(vec![]), ); let mut logits = Logits::tryfromiter(examplelogits.intoiter())?; let mut logits2 = logits.clone();
// SamplerChain with u32 token id type and f32 logit type.
let mut sc = SamplerChain::<u32, f32>::new()
// Bias logits (this example sets bias for token id 3 to -inf)
+ SampleFlatBias::new(&[(3, f32::NEG_INFINITY)])
// Apply a repetition penalty.
+ SampleRepetition::new(1.1, 64)
// Apply frequency and presence penalties.
+ SampleFreqPresence::new(0.05, 0.1, 64)
// Apply temperature to logits.
+ SampleTemperature::new(0.8)
// Sample a token using Mirostat1
+ SampleMirostat1::new(4, 5.0, 0.1);
// Put a value into `last_tokens`, this simulates us having already picked
// that token (3) previously.
res.with_last_tokens_mut(&mut |tokens| tokens.push(3u32))?;
assert_eq!(sc.sample_token(&mut res, &mut logits)?, Some(2));
// Now add the last selected token to the list.
res.with_last_tokens_mut(&mut |tokens| tokens.push(2u32))?;
// And pick the next one. *Important*: Note that we don't reuse `logits`.
// This is because `logits` already has all the filtering/sorting/permutation
// from the previous sample call applied to it.
assert_eq!(sc.sample_token(&mut res, &mut logits2)?, Some(1));
Ok(())
} ```
Note: Crate/docs version likely won't match this repo.
Initial version closely referenced from the samplers in the llama.cpp project (although not a line-by-line port). Thanks!