Presentation Information

[O5-03]Rationalizing temporal decision making and the neural representation of time

*Marshall G Hussain Shuler1,2 (1. Johns Hopkins (United States of America), 2. Kavli Neuroscience Discovery Institute (United States of America))
PDF DownloadDownload PDF

Keywords:

Temporal Difference Reinforcement Learning,reward-rate maximization,dilating time state-space,temporal decision-making

By what neural means do we represent the passage and structuring of time and decide how to spend time? How do these representations of value and time relate to evolutionary pressure to maximize reward accumulation? To address these questions, we evaluate whether the temporal difference reinforcement learning (TDRL) algorithm can rationalize temporal decision-making. First, we derive the optimal solution for reward accumulation and demonstrate that TDRL’s value estimates—infinite sums of exponentially discounted future rewards—systematically deviate from this optimum. Then we show how TDRL, operating over a time state-space representation using regular intervals, fails to learn values that rationalize the curious pattern of decision-making errors exhibited by humans and animals. Our insight, however, is that this failure can be best mitigated by representing time using a time-dilating state-space, wherein the amount of time spent in a subsequent state increases by a precise proportion. TDRL applied to such a time-dilating state-space then learns values that rationalize the diverse suboptimalities observed over decades of investigating how animals and humans decide to spend time. Specifically, it affords optimal forgo behavior, minimizes a suboptimal bias toward sooner-smaller rewards in mutually exclusive choices, and leads to a suboptimal unwillingness to abandon engaged pursuits (sunk cost). In proposing PARSUIT theory (Pursuit-based Atomized Reinforcement of State-value Using Increasing Timesteps), we provide 1) a general, mechanistically descriptive explanation of temporal decision making, 2) a normative rationalization for why time takes the neural form that it does, and 3) advance TDRL as the learning algorithm used in temporal decision-making.