In this work we propose a novel class of distributed algorithms for iterative multi-agent value function approximation for reinforcement learning in Markov decision processes. The algorithms do not require any fusion center and are based on incorporating consensus-based collaborations between the agents over time-varying communication network. We allow local learning strategies of the agents to belong to a unified class of the existing single-agent algorithms which are based on stochastic gradient descent minimization of appropriately defined local cost functions, and which include off-policy learning with eligibility traces. The off-policy local schemes are particularly important since they allow the agents in the resulting distributed algorithm to have different behavior policies while evaluating the response to a single target policy. We discuss the convergence properties of the algorithms, and show that, by a proper design of the network parameters and/or network topology, the convergence point (if exists) can be tuned to coincide with the globally optimal point. The properties and the effectiveness of the proposed algorithms are illustrated by simulations.
M. Stanković, “Distributed value function approximation for multi-agent reinforcement learning,” in Sinteza 2018 International Scientific Conference on Information Technology and Data Related Research, Belgrade, Singidunum University, Serbia, 2018, pp. -.
Stanković, M. (2018). Distributed value function approximation for multi-agent reinforcement learning. Paper presented at Sinteza 2018 International Scientific Conference on Information Technology and Data Related Research.