A deep dive into the causes of nondeterminism in LLM inference reveals that batch size variation, not floating-point operations, is the primary culprit. The article presents solutions for achieving deterministic results through batch-invariant kernels, demonstrating successful implementation with minimal performance impact.
https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
#machinelearning #llmengineering #gpucomputing #determinism #performanceoptimization