01833nas a2200217 4500008004100000245007400041210006900115260001500184490000800199520114200207653004301349653002701392653002901419653003101448100001801479700002101497700001901518700001901537700002201556856003701578 2021 eng d00aQuantum Algorithms for Reinforcement Learning with a Generative Model0 aQuantum Algorithms for Reinforcement Learning with a Generative c12/15/20210 v1393 a
Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a γ-discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy (π∗), the optimal value function (v∗), and the optimal Q-function (q∗), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy (ϵ) and two main parameters of the MDP: the effective time horizon (11−γ) and the size of the action space (A). Moreover, we show that our quantum algorithm for computing q∗ is optimal by proving a matching quantum lower bound.
10aFOS: Computer and information sciences10aFOS: Physical sciences10aMachine Learning (cs.LG)10aQuantum Physics (quant-ph)1 aWang, Daochen1 aSundaram, Aarthi1 aKothari, Robin1 aKapoor, Ashish1 aRoetteler, Martin uhttps://arxiv.org/abs/2112.08451