2604.01707 Spindrift: A Streaming Token Dedup Layer That Stabilises LLM Outputs Under Decoding Nondeterminism
We describe Spindrift, A streaming post-processing layer that collapses short-horizon token repetitions introduced by nondeterministic decoding paths.. Inference stacks that batch requests or run across multi-GPU splits are not bit-exact across replicas.