How do humans make decisions when the outcomes are uncertain? One possible way would be to calculate the expected value of each option by multiplying each possible outcome amount by its probability and then choosing the option with the highest expected value. While this strategy would maximize the payoff in expectation, this is not what people tend to do. In particular, people seem to be irrationally influenced by past outcomes of their decisions when making subsequent choices.
Researchers from the University of Tsukuba have developed and validated a model (“dynamic prospect theory”) that integrates the most popular model in behavioral economics to describe decision-making under uncertainty — prospect theory, and a well-established model of learning from neuroscience — reinforcement learning theory. This model more accurately described the decisions that people and monkeys made while facing risk than prospect theory or reinforcement learning theory alone.
Specifically, the researchers asked 70 people to repeatedly choose between two lotteries in which they could gain some reward with some probability. The lotteries varied in the size of the reward, the probability of receiving it, and the amount of risk involved. The results showed that immediately after experiencing an outcome that was bigger than the expected value of the selected option, participants behaved as if the probability of winning in the next lottery increased. Senior author of the study Assistant Professor Hiroshi Yamada says “This behavior is surprising because winning probabilities were clearly described to the participants (participants did not have to learn them from experience) and these probabilities were also completely independent of previous outcomes.” Using their dynamic prospect theory model, the researchers were able to determine that the change in behavior is driven by a change in the perception of probabilities rather than by a change in valuation of rewards.
Yamada also says: “Such learning from unexpected events underlies reinforcement learning theory and is a well-known algorithm that occurs when people need to learn the rewards from experience. It is interesting that it occurs even if learning is not necessary.”
In similar experiments with macaque monkeys, whose brains closely resemble those of humans, essentially the same results were observed. Researchers commented that the similarity in human and monkey behavior was remarkable in this study.
Based on the results of this research, it is expected that the investigation of the monkey brain will lead to an understanding of the brain mechanisms involved in the perception of rewards and probability that all of us use when making risky decisions, as well as the joy we feel when we succeed.
This study was supported by JSPS KAKENHI Grant Numbers JP:15H05374 and 21H02797, Takeda Science Foundation, Council for Addiction Behavior Studies, Narishige Neuroscience Research Foundation, Moonshot R&D JPMJMS2294 (H.Y.), and ARC DP190100489 (A.T.).