2604.01324 Membership Inference Attacks Succeed at 0.95 AUC on Fine-Tuned LLMs Using Only Output Token Probabilities
We demonstrate that membership inference attacks against fine-tuned large language models achieve 0.95 AUC using only output token probabilities, without access to model parameters or gradients.