Assistant Professor
Peking University HSBC Business School
Mailing Address: Room 747, Peking University HSBC Business School, University Town, Nanshan District, Shenzhen, Guangdong Province, China, 518055
Email: yusun[at]phbs.pku.edu.cn; yusun017[at]gmail.com
We study optimal stopping for diffusion processes with unknown model primitives within the continuous-time reinforcement learning (RL) framework developed by Wang et al. (2020), and present applications to option pricing and portfolio choice. By penalizing the corresponding variational inequality formulation, we transform the stopping problem into a stochastic optimal control problem with two actions. We then randomize controls into Bernoulli distributions and add an entropy regularizer to encourage exploration. We derive a semi-analytical optimal Bernoulli distribution, based on which we devise RL algorithms using the martingale approach established in Jia and Zhou (2022a). We establish a policy improvement theorem and prove the fast convergence of the resulting policy iterations. We demonstrate the effectiveness of the algorithms in pricing finite-horizon American put options, solving Merton's problem with transaction costs, and scaling to high-dimensional optimal stopping problems. In particular, we show that both the offline and online algorithms achieve high accuracy in learning the value functions and characterizing the associated free boundaries.
We study a dynamic portfolio selection problem in which an agent trades a stock and a risk-free asset with the objective of maximizing the rank-dependent utility of her wealth at the terminal time of the investment horizon. Due to time inconsistency, we consider three types of agents, pre-committed, sophisticated, and naive agents, who differ from each other in whether they are aware of the time inconsistency and whether they have self-control. Assuming a neo-additive probability weighting function, we solve the strategies of these agents. We find that the pre-committed agent takes a loss-exit strategy, leading to a positively skewed terminal wealth, and the sophisticated agent is less willing to participate in the stock market than the pre-committed and naive agents. We also study equilibrium asset pricing and find that the stock return with a pre-committed representative agent exhibits a reversal effect and the initial stock price is lower than those in the case of a naive representative agent and in the case of a sophisticated representative agent.