استخراج گذرگاه‌ها با استفاده از تشخیص اشیا در یادگیری تقویتی

محورهای موضوعی : electrical and computer engineering

بهزاد غضنفری ¹ , ناصر مزینی ² , محمدرضا جاهد مطلق ³

1 - دانشگاه علم و صنعت ایران
2 - دانشگاه علم و صنعت ایران
3 - دانشگاه علم و صنعت ایران

تاریخ دریافت : 1394/09/08 تاریخ پذیرش : 1394/09/08 تاریخ انتشار : 1391/04/01

کلید واژه: يادگيري تقويتي خوشه‌بندي اشيا يادگيري تقويتي سلسله مراتبي اقدامات گسترش‌يافته زماني,

چکیده مقاله :

اين مقاله روش جديدي را مطرح مي‌کند که قادر به استخراج گذرگاه‌ها به‌صورت اتوماتيک براي عامل يادگيري تقويتي است. روش پيشنهادي از سيستم‌هاي بيولوژيکي، رفتار و مسيريابي حيوانات الهام گرفته شده است و به‌واسطه تعاملات عامل با محيط پيراموني‌اش عمل مي‌کند. عامل با استفاده از خوشه‌بندي و تشخيص اشيا به‌صورت سلسله مراتبي، نشانه‌هايي را پيدا مي‌کند. اگر اين نشانه‌ها در فضاي اقدام به هم نزديک باشند، گذرگاه‌ها با استفاده از حالت‌هاي بين آنها استخراج مي‌شوند. نتايج آزمايش‌ها بهبود قابل ملاحظه‌اي را در فرايند يادگيري تقويتي در مقايسه با ساير روش‌هاي مشابه نشان مي‌دهد.

چکیده انگلیسی:

Extracting bottlenecks improves considerably the speed of learning and the ability knowledge transferring in reinforcement learning. But, extracting bottlenecks is a challenge in reinforcement learning and it typically requires prior knowledge and designer’s help. This paper will propose a new method that extracts bottlenecks for reinforcement learning agent automatically. We have inspired of biological systems, behavioral analysts and routing animals and the agent works on the basis of its interacting to environment. The agent finds landmarks based in clustering and hierarchical object recognition. If these landmarks in actions space are close to each other, bottlenecks are extracted using the states between them. The Experimental results show a considerable improvement in the process of learning in comparison to some key methods in the literature.

منابع و مأخذ:

[1] L. Kaelbling, M. Littman, and A. Moore, "Reinforcement learning: a survey," J. of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
[2] M. Ghavamzadeh, S. Mahadevan, and R. Makar, "Hierarchical multi-agent reinforcement learning," Autonomous Agents and Multi-Agent Systems, vol. 13, no. 2, pp. 197-229, Sep. 2006.
[3] A. Barto and S. Mahadevan, "Recent advances in hierarchical reinforcement learning markov and semi-markov decision processes," Discrete Event Dynamic Systems, vol. 13, pp. 41-77, 2003.
[4] R. S. Sutton, D. Precup, and S. Singh, "Between MDPs and semi - MDPs: a framework for temporal abstraction in reinforcement learning," Artificial Intelligence, vol. 112, no. 1-2, pp. 181-211, Aug. 1999.
[5] R. Parr and S. Russell, "Reinforcement learning with hierarchies of machines," in Proc. Conf. on Advances in Neural Information Processing Systems, pp. 1043-1049, 1997.
[6] T. G. Dietterich, "Hierarchical reinforcement learning with the MAXQ value function decomposition," J. of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
[7] G. Kheradmandian and M. Rahmati, "Automatic abstraction in reinforcement learning using data mining techniques," Robotics and Autonomous Systems, vol. 57, no. 11, pp. 1119-1128, Nov. 2009.
[8] S. Mannor, I. Menache, A. Hoze, and U. Klein, "Dynamic abstraction in reinforcement learning via clustering," in Proc. 21st Int. Conf. on Machine learning, ICML'04, p. 560-567, 2004.
[9] E. A. Mcgovern, Autonomous Discovery of Temporal Abstractions from Interaction with an Environment, Citeseer, 2002.
[10] C. Chiu and V. W. Soo, "Automatic complexity reduction in reinforcement learning," Computational Intelligence, vol. 26, no. 1, pp. 1-25, Feb. 2010.
[11] I. Menache, S. Mannor, and N. Shimkin, "Q - cut - dynamic discovery of sub-goals in reinforcement learning," in Proc. of the 13th European Conf. on Machine Learning, pp. 295-3062002.
[12] O. Simsek, A. P. Wolfe, and A. G. Barto, "Identifying useful subgoals in reinforcement learning by local graph partitioning," in Proc. of the 22nd Int. Conf. on Machine Learning , ICML'05, pp. 816-823, 2005.
[13] B. Digney, "Learning hierarchical control structures for multiple tasks and changing environments," in: Proc. of 5th Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 5, pp. 321-330, 1998.
[14] O. Simsek and A. Barto, "Skill characterization based on betweenness," in Proc. 22nd Annual Conf. on Advances in Neural Information Processing Systems, NIPS'08, pp. 1497-1504, 2008.
[15] M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition in cortex," Nature Neuroscience, vol. 2, no. 11, pp. 1019-25, Nov. 1999.
[16] T. S. Collett and P. Graham, "Animal navigation: path integration, visual landmarks, and cognitive maps," Current Biology, vol. 14, no. 12, pp. 475-457, Jun. 2004.
[17] S. Thrun, "Learning metric - topological maps for indoor mobile robot navigation," Artificial Intelligence, vol. 99, no. 1, pp. 21-71, 1998.
[18] N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich, "Automatic discovery and transfer of task hierarchies in reinforcement learning," AI Magazine, vol. 32, no. 1, p. 35, 2011.
[19] A. Jonsson, A Causal Approach to Hierarchical Decomposition in Reinforcement Learning, Ph. D. Thesis, University of Massachusetts Amherst, Feb. 2006.
[20] B. Hengst, Discovering Hierarchy in Reinforcement Learning, Ph. D. Thesis, University of New South Wales, Australia, Dec. 2003.
[21] S. Thrun and A. Schwartz, "Finding structure in reinforcement learning," Proc. 5th Annual Conf. on Advances in Neural Information Processing Systems, NIPS'95, pp. 385-392, 1995.
[22] C. C. Chiu, "Subgoal identification for reinforcement learning and planning in multiagent problem solving," in Proc. of 5th German Conf. on Multiagent System Technologies, pp. 37-48, 2007.

اشتراک گذاری

آدرس مقاله

استخراج گذرگاه‌ها با استفاده از تشخیص اشیا در یادگیری تقویتی