Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes

  • Chang, Hyeong-Soo (Department of Computer Science and Engineering, Songang University)
  • Published : 2003.09.01

Abstract

We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the "localization" concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with its own state and action spaces. Based on that, we present a gradient-ascent like iterative distributed algorithm that converges to a local optimal solution of the global MDP. The solution is an autonomous joint policy in that each agent's decision is based on only its local state.cal state.

Keywords

References

  1. IEEE Trans. Automat. Control v.AC-32 Decentralized optimal control of Markov chains with a common past information set M. Aicardi;F Davoli;R. Minciardi
  2. Markov Decision Processes, Models, Methods, Directions, and Open Problems Applications of Markov decision processes in communication networks: a survey E. Altman;E. Feinberg(ed.);A. Shwartz(ed.)
  3. Proc. of the 41st IEEE CDC Optimal planning for autonomous air vehicle battle management G. Arslan;J. D. Wolfe;J. Shamma;J. L. Speyer
  4. Parallel and Distributed Computation;Numerical Methods D. P. Bertsekas;J. N. Tsitsiklis
  5. Neuro-Dynamic Programming D. P. Bertsekas;J. N. Tsitsiklis
  6. Technical Report 2001-4, Department of Stochastics, Vrije Universiteit Amsterdam On the structure of value functions for threshold policies in queueing models S. Bhulai;G. Koole
  7. Ph.D. Thesis, School of Electrical and Computer Engineering, Purdue University On-line sampling-based control for network queueing problems H. S. Chang
  8. Operations Research The linear programming approach to approximate dynamic programming D. P. de Farias;B. Van Roy
  9. Performance Evaluation v.18 The Markov-modulated Poisson process (MMPP) cookbook W. Fischer;K. Mejer-Helistern
  10. Adaptive Markov Control Processes O. Hernandez-Lerma
  11. IEEE Trans. Automat. Control v.AC-27 Decentralized control of finite state Markov processes K. Hsu;S. I. Marcus
  12. IEEE Trans. Automat. Control v.37 no.5 On distributed dynamic programming A. Jalali;M. J. Ferguson
  13. J. Network and Systems Management v.3 On computing Markov decision theory-based cost for routing in circuit-switched broadband networks A. Kolarov;J. Hui
  14. Queueing Systems v.34 On the value function of a priority queue with an application to a controlled pollying model G. Koole;P. Nain
  15. IEEE Trans. Automat. Control v.AC-19 no.5 Decomposition of systems governed by Markov chains H. J. Kushner;C. Chen
  16. Proc. 11th Annual Conf. on Uncertainty in Artificial Intelligence On the complexity of solving Markov decision problems M. Littman;T. Dean;L. Kaelbling
  17. Ann. Oper. Res. v.35 Separable routing:a scheme for state-dependent routing of circuit switched telephone traffic T. J. Ott;K. R. Krishnan
  18. Markov Decision Processes:Discrete Stochastic Dynamic Programming M. L. Puterman
  19. Proc. 1st Int. Workshop on the Numerical Solution of Markov Chains A survey of aggregation-disaggregation in large Markov chains P. J. Schweitzer
  20. Comput. Oper. Res. v.27 Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands N. Secomandi