ollama/x/create
Daniel Hiltgen 03aee88186
mlx: Support NVIDIA TensorRT Model Optimizer import (#15566)
* mlx: Support NVIDIA TensorRT Model Optimizer import

* x/create: support FP8 safetensors import

Decode HF F8_E4M3 safetensors with block scale companions into MLX-importable tensor blobs, including compressed-tensors weight_scale metadata, packed NVFP4 layouts, and mixed-precision tensor headers.

Use that source-precision metadata during create quantization: default FP8-sourced imports to mxfp8, allow source FP8 to target MLX low-bit formats, preserve source-quantized NVFP4 layouts, selectively keep or promote tensors based on their source precision, and detect quantized dtype from mixed-precision safetensors manifests.

* review comments
2026-04-27 18:28:10 -07:00
..
client mlx: Support NVIDIA TensorRT Model Optimizer import (#15566) 2026-04-27 18:28:10 -07:00
create.go mlx: Support NVIDIA TensorRT Model Optimizer import (#15566) 2026-04-27 18:28:10 -07:00
create_test.go mlx: Support NVIDIA TensorRT Model Optimizer import (#15566) 2026-04-27 18:28:10 -07:00
dtype.go mlx: Support NVIDIA TensorRT Model Optimizer import (#15566) 2026-04-27 18:28:10 -07:00
gemma4.go Keep Gemma4 router projection in source precision (#15613) 2026-04-15 15:04:23 -07:00
gemma4_test.go Keep Gemma4 router projection in source precision (#15613) 2026-04-15 15:04:23 -07:00
imagegen.go create: Clean up experimental paths, fix create from existing safetensor model (#14679) 2026-04-07 08:12:57 -07:00
qwen35.go create: Clean up experimental paths, fix create from existing safetensor model (#14679) 2026-04-07 08:12:57 -07:00