Thursday, February 15, 2024

On Neural Networks Learn Statistics of Increasing Complexity

This seems to be an interesting paper! Caveat: Did not have time to read it yet!

"... Researchers ... a new paper in which they present new theoretical and empirical evidence for distributional simplicity bias (DSB). DSB posits that neural networks first learn low-order moments (mean and variance) of a data distribution, before moving on to higher-order correlations (skewness or kurtosis). The researchers demonstrate this by training models on real datasets and evaluating them (throughout training) on synthetic data designed to probe the models’ reliance on statistics of different orders.  They demonstrate this behavior across a variety of image and language architectures. ..."

From the abstract:
"The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in training, then lose this ability later. We also extend the DSB to discrete domains by proving an equivalence between token n-gram frequencies and the moments of embedding vectors, and by finding empirical evidence for the bias in LLMs. Finally we use optimal transport methods to surgically edit the low-order statistics of one class to match those of another, and show that early-training networks treat the edited samples as if they were drawn from the target class. Code is available at this https URL."


[2402.04362] Neural Networks Learn Statistics of Increasing Complexity (open access)


No comments: