Common Sense: The size of the reward matters too for learning speed

Monday, May 25, 2026

The size of the reward matters too for learning speed

Amazing stuff!

"Scientists have long believed that training an animal, even to perform simple tasks, is a painstaking process requiring hundreds of repetitions. Under standard protocols, animals receive only a small reward after each attempt, maximizing the number of reinforcements per training session. Accumulated experience, at least according to conventional wisdom, is more important than the size of the incentive. New experiments in mice, however, may upend this long-held assumption.

“I mean this quite literally, no one ever checked,” neuroscientist Josh Dudman said in a statement. For the new study, his lab trained mice to complete a range of navigation, motor skill, and decision-making tasks. Thirsty mice that received a few large gulps of water as a reward, the team reported, became experts much faster than animals that got many tiny sips . ..."

"KEY TAKEAWAYS

The Dudman Lab examined what happens when animals are given bigger-than-normal rewards as they learned to perform a task. They found that larger rewards speed up learning and reduce individual differences, even though animals had much less experience performing the task.
The larger payout causes a sustained increase in dopamine — a chemical messenger in the brain that helps regulate learning and motivation. This allows the brain to gain more from each experience and be more engaged in the task at hand, both of which contribute to faster learning.
The findings could change how scientists think about how brains learn, the role of dopamine in learning, and how they study learning.

..."

From the editor's summary and abstract:

"Editor’s summary

Training an animal on a complex task is often a painstaking, incremental process. This is because conventional behavioral learning protocols focus on minimizing reward to maximize trials. Gong et al. tested how reward size shapes learning in mice. Very large rewards markedly accelerated learning across tasks and led to increased dopamine release in the striatum.

Animals learned faster, became more efficient in reward collection, remained engaged in the tasks, and showed across-session improvements compared with smaller reward magnitudes.

Striatal dopamine responses scaled with reward size and extending dopamine activity using optogenetics reproduced many of the learning benefits. These results provide an important contribution to our understanding of the role of large rewards in learning and motivation. ...

Structured Abstract

INTRODUCTION

Across different disciplines that share an interest in learning, from artificial intelligence (AI) to experimental psychology, it has long been assumed that there is a free parameter, the learning rate, that determines individual variance in learning efficiency and is relatively independent of the magnitude of reward.

This suggests that learning depends primarily on the amount of experience (number of rewards). However, recent theoretical work mapping dopamine (DA) function onto reinforcement learning algorithms, combined with classic results on DA encoding of reward, suggested that learning rates might in fact depend upon reward magnitude. This also raises the possibility that, as a field, we may have settled on suboptimal reward magnitude distributions that slow training in complex laboratory tasks and also underestimated the efficiency of animal learning.

RATIONALE

An influential set of observations led to the hypothesis that DA neuron activity implements the reward prediction error component of reinforcement learning algorithms. However, recent work has proposed that DA activity may map onto the learning rate during acquisition.

The learning rate parameter, as the name implies, determines how fast learning converges to its asymptote. Classic experimental results demonstrated that DA activity is correlated with reward magnitude. Together, these two points imply an unexpected hypothesis:

Reward magnitude could determine the efficiency of reinforcement learning. There are few data on what magnitude of reward is optimal for learning in any laboratory animal. This is especially true for the range of navigation, motor skill, and decision-making tasks typical of modern systems neuroscience experiments in mice.

Nonetheless, essentially the entire field uses reward magnitudes from within a very small range. Those chosen reward magnitudes are quite small relative to the daily needs of a mouse (<1%). Thus, we set out to determine whether, and if so why, increases in reward magnitude could increase the efficiency of animal learning.

RESULTS

Increasing reward magnitude by one to two orders relative to the standard reward sizes used in the field substantially increased the efficiency of learning across a range of tasks. We found that mice could learn from at least an order of magnitude fewer trials in a hidden target navigation task, an effort-based reach-to-pull motor skill task, and a sensorimotor decision-making task.

In general, across all three tasks, the efficiency of learning was increased without a notable change in the quality of the final, trained performance. At the upper limit, these effects could be substantial. For example, some mice learned a hidden target navigation task in only a few experiences of reinforcement, something that requires hundreds or thousands of reinforcements using standard reward magnitudes. We further showed that these effects could be well explained once one appreciates that the efficiency of learning is determined by three critical components:

(i) the learning rate,

(ii) the ability to capture learned improvements from prior sessions, and

(iii) the extent of sustained engagement in a task.

In our study, large rewards improved all three aspects. Large rewards produced longer, more sustained activity of DA neurons during reward consumption. We tested whether augmenting normal responses to reward with optogenetic-mediated sustained activation of DA were sufficient to enhance learning efficiency with standard reward magnitudes. Sustained optogenetic “boosting” of DA reward responses was able to increase learning efficiency in both hidden target navigation and the effort-based motor skill task.

DA stimulation increased learning efficiency by increasing the learning rate and reducing disengagement, but failed to enhance capture of prior learning. Finally, we showed that increasing reward magnitude, while always improving learning as measured in DA activity, does not always lead to obvious improvements in behavioral measures of learning. For example, the presence of large rewards appears to interfere with anticipatory behavior in classical conditioning paradigms.

CONCLUSION

We found that larger reward magnitudes than used in the field could indeed enhance the learning efficiency of mice across a range of complex tasks, including navigation, motor skill, and decision-making. One of the largest sources of variance across individual mice was the ability to stay engaged in task performance. Unexpectedly, variance in learning rate across individuals appeared to be much smaller. As a result, large rewards could substantially attenuate variance across individuals in learning efficiency.

Finally, mesolimbic DA neuron activity could produce multiple effects on learning depending upon the magnitude and time course of DA activation."

ScienceAdviser

The Bigger the Reward, the Faster We Learn (original news release) "Researchers in the Dudman Lab at HHMI’s Janelia Research Campus found that learning happens faster when there’s a bigger payoff for success, potentially changing how neuroscientists think about learning and how they study it"

Reward magnitude determines reinforcement learning efficiency (no public access)

New research from the Dudman Lab finds animals learn faster when they are given larger-than-normal rewards as they learn to perform a task. Artificially extending the dopamine signals associated with small rewards also caused learning to happen faster.

Monday, May 25, 2026

The size of the reward matters too for learning speed

No comments: