Counterfactual Credit Attribution for Autoregressive Generative Models

Aloni Cohen and Chenhao Zhang

Abstract

We study the ex post proper training datapoint credit attribution problem for generative models of autoregressive architecture, i.e., models that are autoregressive composition of next-token predictors. We consider the notion of counterfactual credit attribution (CCA) recently proposed in Livni et al. [2024]. We show the impossibility of achieving CCA of autoregressive models with the natural approach of imposing CCA requirement on the next-token predictor. On the other hand, for a given model, we characterize the credit scheme that gives minimum credit while satisfying perfect CCA. We show the hardness for approximating the minimum credit scheme with black-box access to the next-token predictor.

Draft Coming Soon!