Study uncovers how brain learns which actions in a sequence are rewarded

This understanding of the reward process could advance more efficient learning systems in education and artificial intelligence (AI)

Representative image of a brain scan (photo: DW)

PTI

Published: 14 Dec 2023, 9:33 PM

New research has uncovered how the brain figures out exactly which action in a sequence of tasks performed led to a 'reward'.

While the brain chemical dopamine is known to play a crucial role in the reward process, important for reinforcing behaviours while learning, researchers from Allen Institute and colleagues said how exactly the brain linked certain actions to the dopamine release has remained unclear, calling it the brain's 'credit assignment problem'.

They found that dopamine not only signalled a reward but, through trial and error, also guided animals to home in on the specific behaviours that lead to these rewards.

The researchers said that the findings highlighted a sophisticated learning strategy where behaviours are not just reinforced, but actively shaped and fine-tuned through experience.

The brain's reward system can thus swiftly and dynamically alter the full range of an animal's movements and behaviours, they said in their study published in the journal Nature.

"When you reinforce behaviour, we often think it's just that action," said senior author Rui Costa, the president and CEO of the Allen Institute.

"But no: you're changing the entire behavioural structure. And what was really surprising was how rapid it was," said Costa.

The insights could also be relevant to fields such as education and artificial intelligence (AI), the researchers said, as they could help develop more advanced and efficient learning systems.

Also Read: Our brain can't 'rewire' itself, say neuroscientists

This could contribute to creating AI that is better at adapting to new data and situations, they said.

For the study, the researchers fitted mice with wireless sensors that tracked their movement data and fed them into an algorithm categorising these actions into distinct groups.

When the mice performed the team's "target actions", the scientists triggered dopamine neurons to 'reward' them, using optogenetics in which neurons are controlled through light.

They found that the mice, responding to the 'reward', performed the target actions more frequently.

Actions similar to the target ones along with those occurring just prior to the release too were found to increase in frequency, while those dissimilar to the target decreased. This refinement of focusing on the action rewarded became more precise over time.

The scientists further found that the actions right before the reward were quickly grasped and improved upon, than those more distant from and earlier to the reward. The mice also learnt slowly when actions triggering the reward were placed apart in the sequence.

The team thus learnt that the mice employed a key process, similar to rewinding time, to understand what exactly led to a reward.

Join our official telegram channel (@nationalherald) and stay updated with the latest headlines

Opinion Articles Subscribe Newspaper