Distributed On-Policy Actor-Critic Reinforcement Learning




Abstract:
In this paper, a novel distributed on-policy Actor-Critic algorithm for multiagent reinforcement learning is proposed. The algorithm consists of the temporal difference scheme with function approximation at the Critic stage, and a policy gradient algorithm at the Actor stage, derived starting from a global objective. At both stages, decentralized agreement among the agents is achieved using the linear dynamic consensus strategy. Compared to the existing schemes, the algorithm has improved convergence rate and noise immunity, and a possibility to achieve multi-task global optimization.

CITATION:

IEEE format

M. Stanković, M. Beko, M. Pavlović, I. Popadić, S. Stanković, “Distributed On-Policy Actor-Critic Reinforcement Learning,” in Sinteza 2022 - International Scientific Conference on Information Technology and Data Related Research, Belgrade, Singidunum University, Serbia, 2022, pp. 389-393. doi:10.15308/Sinteza-2022-389-393

APA format

Stanković, M., Beko, M., Pavlović, M., Popadić, I., Stanković, S. (2022). Distributed On-Policy Actor-Critic Reinforcement Learning. Paper presented at Sinteza 2022 - International Scientific Conference on Information Technology and Data Related Research. doi:10.15308/Sinteza-2022-389-393

BibTeX format
Download

RefWorks Tagged format
Download