ollama

mirror of https://github.com/ollama/ollama.git synced 2026-05-13 14:27:00 +00:00

History

Daniel Hiltgen 03aee88186 mlx: Support NVIDIA TensorRT Model Optimizer import (#15566 ) * mlx: Support NVIDIA TensorRT Model Optimizer import * x/create: support FP8 safetensors import Decode HF F8_E4M3 safetensors with block scale companions into MLX-importable tensor blobs, including compressed-tensors weight_scale metadata, packed NVFP4 layouts, and mixed-precision tensor headers. Use that source-precision metadata during create quantization: default FP8-sourced imports to mxfp8, allow source FP8 to target MLX low-bit formats, preserve source-quantized NVFP4 layouts, selectively keep or promote tensors based on their source precision, and detect quantized dtype from mixed-precision safetensors manifests. * review comments		2026-04-27 18:28:10 -07:00
..
client	mlx: Support NVIDIA TensorRT Model Optimizer import (#15566 )	2026-04-27 18:28:10 -07:00
create.go	mlx: Support NVIDIA TensorRT Model Optimizer import (#15566 )	2026-04-27 18:28:10 -07:00
create_test.go	mlx: Support NVIDIA TensorRT Model Optimizer import (#15566 )	2026-04-27 18:28:10 -07:00
dtype.go	mlx: Support NVIDIA TensorRT Model Optimizer import (#15566 )	2026-04-27 18:28:10 -07:00
gemma4.go	Keep Gemma4 router projection in source precision (#15613 )	2026-04-15 15:04:23 -07:00
gemma4_test.go	Keep Gemma4 router projection in source precision (#15613 )	2026-04-15 15:04:23 -07:00
imagegen.go	create: Clean up experimental paths, fix create from existing safetensor model (#14679 )	2026-04-07 08:12:57 -07:00
qwen35.go	create: Clean up experimental paths, fix create from existing safetensor model (#14679 )	2026-04-07 08:12:57 -07:00