ollama/tokenizer
Daniel Hiltgen ec9b4e9e47
tokenizer: fix multi-regex BPE offset handling (#15844)
Use the current fragment offset when emitting unmatched spans during multi-regex BPE splitting. This avoids duplicating earlier prompt text and inflating token counts for multi-stage BPE tokenizers.
2026-04-27 14:14:27 -07:00
..
testdata
bytepairencoding.go tokenizer: fix multi-regex BPE offset handling (#15844) 2026-04-27 14:14:27 -07:00
bytepairencoding_test.go tokenizer: fix multi-regex BPE offset handling (#15844) 2026-04-27 14:14:27 -07:00
sentencepiece.go
sentencepiece_test.go
tokenizer.go
vocabulary.go
vocabulary_test.go
wordpiece.go
wordpiece_test.go