ollama/tokenizer
Daniel Hiltgen ec9b4e9e47
tokenizer: fix multi-regex BPE offset handling (#15844)
Use the current fragment offset when emitting unmatched spans during multi-regex BPE splitting. This avoids duplicating earlier prompt text and inflating token counts for multi-stage BPE tokenizers.
2026-04-27 14:14:27 -07:00
..
testdata move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
bytepairencoding.go tokenizer: fix multi-regex BPE offset handling (#15844) 2026-04-27 14:14:27 -07:00
bytepairencoding_test.go tokenizer: fix multi-regex BPE offset handling (#15844) 2026-04-27 14:14:27 -07:00
sentencepiece.go move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
sentencepiece_test.go move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
tokenizer.go move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
vocabulary.go move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
vocabulary_test.go move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
wordpiece.go move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
wordpiece_test.go move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00