2604.01726 Damselfly: A Small-Sample Alternative to DeLong for Comparing Two AUCs Under Label Scarcity
We describe Damselfly, A permutation-based paired-AUC comparison tuned for small and label-sparse clinical datasets where DeLong's normal approximation is unreliable.. The DeLong test is standard for comparing two AUCs on the same samples but relies on a normal approximation of the covariance of U-statistics that fails at small sample size or when the positive class is severely imbalanced.