research-article
AUTHORs: Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, + 4, Shuyue Hu, Xiao Huang, Hau Chan, Bo An (Less)
ICML'24: Proceedings of the 41st International Conference on Machine Learning
Article No.: 1130, Pages 28146 - 28203
Published: 03 January 2025 Publication History
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
ICML'24: Proceedings of the 41st International Conference on Machine Learning
Configurable mirror descent: towards a unification of decision making
Pages 28146 - 28203
PREVIOUS CHAPTERLearning shadow variable representation for treatment effect estimation under collider biasPreviousNEXT CHAPTEREnhancing class-imbalanced learning with pre-trained guidance through class-conditional knowledge distillationNext- Abstract
- References
- View Options
- References
- Media
- Tables
- Share
Abstract
Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Although various methods have been proposed to address the specific decision-making categories, these methods typically evolve independently and cannot generalize to other categories. Therefore, a fundamental question for decision-making is: Can we develop a single algorithm to tackle ALL categories of decision-making problems? There are several main challenges to address this question: i) different categories involve different numbers of agents and different relationships between agents, ii) different categories have different solution concepts and evaluation measures, and iii) there lacks a comprehensive benchmark covering all the categories. This work presents a preliminary attempt to address the question with three main contributions. i) We propose the generalized mirror descent (GMD), a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences. ii) We propose the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyperparameters in GMD conditional on the evaluation measures. iii) We construct the GAMEBENCH with 15 academic-friendly games across different decision-making categories. Extensive experiments demonstrate that CMD achieves empirically competitive or better outcomes compared to baselines while providing the capability of exploring diverse dimensions of decision making.
References
[1]
Albrecht, J., Fetterman, A., Fogelman, B., Kitanidis, E., Wroblewski, B., Seo, N., Rosenthal, M., Knutins, M., Polizzi, Z., Simon, J., and Qiu, K. Avalon: A benchmark for RL generalization using procedurally generated worlds. In NeurIPS Datasets and Benchmarks Track, pp. 12813-12825, 2022.
[2]
Amos, B., Xu, L., and Kolter, J. Z. Input convex neural networks. In ICML, pp. 146-155, 2017.
[3]
Anagnostides, I., Farina, G., Panageas, I., and Sandholm, T. Optimistic mirror descent either converges to Nash or to strong coarse correlated equilibria in bimatrix games. In NeurIPS, pp. 16439-16454, 2022a.
[4]
Anagnostides, I., Panageas, I., Farina, G., and Sandholm, T. On last-iterate convergence beyond zero-sum games. In ICML, pp. 536-581, 2022b.
[5]
Ao, R., Cen, S., and Chi, Y. Asynchronous gradient play in zero-sum multi-agent games. In ICLR, 2023.
[6]
Aumann, R. J. Correlated equilibrium as an expression of Bayesian rationality. Econometrica: Journal of the Econometric Society, 55(1):1-18, 1987.
[7]
Bailey, J. P. and Piliouras, G. Multiplicative weights update in zero-sum games. In EC, pp. 321-338, 2018.
Digital Library
[8]
Bailey, J. P. and Piliouras, G. Fast and furious learning in zero-sum games: Vanishing regret with nonvanishing step sizes. In NeurIPS, pp. 12997-13007, 2019.
[9]
Bakhtin, A., Brown, N., Dinan, E., Farina, G., Flaherty, C., Fried, D., Goff, A., Gray, J., Hu, H., Jacob, A. P., Komeili, M., Konath, K., Kwon, M., Lerer, A., Lewis, M., Miller, A. H., Mitts, S., Renduchintala, A., Roller, S., Rowe, D., Shi, W., Spisak, J., Wei, A., Wu, D., Zhang, H., and Zijlstra, M. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067-1074, 2022.
[10]
Bard, N., Foerster, J. N., Chandar, S., Burch, N., Lanctot, M., Song, H. F., Parisotto, E., Dumoulin, V., Moitra, S., Hughes, E., Dunning, I., Mourad, S., Larochelle, H., Bellemare, M. G., and Bowling, M. The Hanabi challenge: A new frontier for AI research. Artificial Intelligence, 280:103216, 2020.
Digital Library
[11]
Beck, A. and Teboulle, M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167-175, 2003.
Digital Library
[12]
Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279, 2013.
[13]
Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Jozefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Pinto, H. P. d. O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., and Zhang, S. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
[14]
Boyd, S. P. and Vandenberghe, L. Convex Optimization. Cambridge University Press, 2004.
[15]
Brown, N. and Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418-424, 2018.
[16]
Brown, N. and Sandholm, T. Superhuman AI for multiplayer poker. Science, 365(6456):885-890, 2019.
[17]
Campbell, M., Hoane Jr, A. J., and Hsu, F.-h. Deep Blue. Artificial Intelligence, 134(1-2):57-83, 2002.
Digital Library
[18]
Carminati, L., Zhang, B. H., Farina, G., Gatti, N., and Sandholm, T. Hidden-role games: Equilibrium concepts and computation. arXiv preprint arXiv:2308.16017, 2023.
[19]
Cen, S., Chi, Y., Du, S. S., and Xiao, L. Faster last-iterate convergence of policy optimization in zero-sum Markov games. In ICLR, 2023.
[20]
Chen, Y., Song, X., Lee, C., Wang, Z., Zhang, R., Dohan, D., Kawakami, K., Kochanski, G., Doucet, A., Ranzato, M., et al. Towards learning universal hyperparameter optimizers with transformers. In NeurIPS, pp. 32053-32068, 2022.
[21]
Cilingir, H. K., Manzelli, R., and Kulis, B. Deep divergence learning. In ICML, pp. 2027-2037, 2020.
[22]
Davis, O. A. and Whinston, A. Externalities, welfare, and the theory of games. Journal of Political Economy, 70(3): 241-262, 1962.
[23]See AlsoDescending Necrotizing Mediastinitis Caused by Streptococcus pyogenes in a Child with Primary Epstein–Barr Virus InfectionThe Effects of the L / N‐Type Calcium Channel Blocker (Cilnidipine) on Sympathetic Hyperactive Morning Hypertension: Results From ACHIEVE‐ONE
de Witt, C. S., Gupta, T., Makoviichuk, D., Makoviychuk, V., Torr, P. H., Sun, M., and Whiteson, S. Is independent learning all you need in the StarCraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
[24]
Fang, Y., Tang, Z., Ren, K., Liu, W., Zhao, L., Bian, J., Li, D., Zhang, W., Yu, Y., and Liu, T.-Y. Learning multiagent intention-aware communication for optimal multiorder execution in finance. In KDD, pp. 4003-4012, 2023.
[25]
Farina, G., Bianchi, T., and Sandholm, T. Coarse correlation in extensive-form games. In AAAI, pp. 1934-1941, 2020.
[26]
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual multi-agent policy gradients. In AAAI, pp. 2974-2982, 2018.
[27]
Foerster, J., Song, F., Hughes, E., Burch, N., Dunning, I., Whiteson, S., Botvinick, M., and Bowling, M. Bayesian action decoder for deep multi-agent reinforcement learning. In ICML, pp. 1942-1951, 2019.
[28]
Golowich, N., Pattathil, S., and Daskalakis, C. Tight lastiterate convergence rates for no-regret learning in multiplayer games. In NeurIPS, pp. 20766-20778, 2020.
[29]
Harsanyi, J. C., Selten, R., et al. A General Theory of Equilibrium Selection in Games. The MIT Press, 1988.
[30]
Hong, S., Zheng, X., Chen, J., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S. K. S., Lin, Z., Zhou, L., Ran, C., Xiao, L., Wu, C., and Schmidhuber, J. MetaGPT: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
[31]
Hsieh, Y.-G., Antonakopoulos, K., and Mertikopoulos, P. Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. In COLT, pp. 2388-2422, 2021.
[32]
Ilyas, A., Engstrom, L., Athalye, A., and Lin, J. Black-box adversarial attacks with limited queries and information. In ICML, pp. 2137-2146, 2018.
[33]
Jain, R., Piliouras, G., and Sim, R. Matrix multiplicative weights updates in quantum zero-sum games: Conservation laws & recurrence. In NeurIPS, pp. 4123-4135, 2022.
[34]
Kangarshahi, E. A., Hsieh, Y.-P., Sahin, M. F., and Cevher, V. Let's be honest: An optimal no-regret framework for zero-sum games. In ICML, pp. 2488-2496, 2018.
[35]
Kozuno, T., Menard, P., Munos, R., and Valko, M. Modelfree learning for two-player zero-sum partially observable Markov games with perfect recall. In NeurIPS, pp. 11987-11998, 2021.
[36]
Kuhn, H. W. A simplified two-person poker. Contributions to the Theory of Games, 1:97-103, 1950.
[37]
Kurach, K., Raichuk, A., Stanćzyk, P., Zajač, M., Bachem, O., Espeholt, L., Riquelme, C., Vincent, D., Michalski, M., Bousquet, O., and Gelly, S. Google research football: A novel reinforcement learning environment. In AAAI, pp. 4501-4510, 2020.
[38]
Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Perolat, J., Silver, D., and Graepel, T. A unified game-theoretic approach to multiagent reinforcement learning. In NeurIPS, pp. 4190-4203, 2017.
[39]
Lanctot, M., Lockhart, E., Lespiau, J.-B., Zambaldi, V., Upadhyay, S., Pérolat, J., Srinivasan, S., Timbers, F., Tuyls, K., Omidshafiei, S., Hennes, D., Morrill, D., Muller, P., Ewalds, T., Faulkner, R., Kramár, J., De Vylder, B., Saeta, B., Bradbury, J., Ding, D., Borgeaud, S., Lai, M., Schrittwieser, J., Anthony, T., Hughes, E., Danihelka, I., and Ryan-Davis, J. OpenSpiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453, 2019.
[40]
Lanctot, M., Schultz, J., Burch, N., Smith, M. O., Hennes, D., Anthony, T., and Perolat, J. Population-based evaluation in repeated rock-paper-scissors as a benchmark for multiagent reinforcement learning. TMLR, 2023. ISSN 2835-8856.
[41]
Lee, C.-W., Kroer, C., and Luo, H. Last-iterate convergence in extensive-form games. In NeurIPS, pp. 14293-14305, 2021.
[42]
Lewis, M., Yarats, D., Dauphin, Y., Parikh, D., and Batra, D. Deal or no deal? End-to-end learning of negotiation dialogues. In EMNLP, pp. 2443-2453, 2017.
[43]
Lindauer, M., Eggensperger, K., Feurer, M., Biedenkapp, A., Deng, D., Benjamins, C., Ruhkopf, T., Sass, R., and Hutter, F. SMAC3: A versatile Bayesian optimization package for hyperparameter optimization. The Journal of Machine Learning Research, 23(1):2475-2483, 2022.
Digital Library
[44]
Liu, M., Ozdaglar, A. E., Yu, T., and Zhang, K. The power of regularization in solving extensive-form games. In ICLR, 2023.
[45]
Liu, S., Chen, P.-Y., Kailkhura, B., Zhang, G., Hero III, A. O., and Varshney, P. K. A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications. IEEE Signal Processing Magazine, 37(5):43-54, 2020.
[46]
Liu, S., Lever, G., Wang, Z., Merel, J., Eslami, S. A., Hennes, D., Czarnecki, W. M., Tassa, Y., Omidshafiei, S., Abdolmaleki, A., et al. From motor control to team play in simulated humanoid football. Science Robotics, 7(69): eabo0235, 2022a.
[47]
Liu, W., Jiang, H., Li, B., and Li, H. Equivalence analysis between counterfactual regret minimization and online mirror descent. In ICML, pp. 13717-13745, 2022b.
[48]
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., and Mordatch, I. Multi-agent actor-critic for mixed cooperativecompetitive environments. In NeurIPS, pp. 6382-6393, 2017.
[49]
Lu, F., Raff, E., and Ferraro, F. Neural Bregman divergences for distance learning. In ICLR, 2023.
[50]
Mao, S., Cai, Y., Xia, Y., Wu, W., Wang, X., Wang, F., Ge, T., and Wei, F. Alympics: Language agents meet game theory - exploring strategic decision-making with AI agents. arXiv preprint arXiv:2311.03220, 2023.
[51]
Marris, L., Muller, P., Lanctot, M., Tuyls, K., and Graepel, T. Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers. In ICML, pp. 7480-7491, 2021.
[52]
Mertikopoulos, P., Lecouat, B., Zenati, H., Foo, C.-S., Chandrasekhar, V., and Piliouras, G. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In ICLR, 2019.
[53]
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, 2015.
[54]
Moulin, H. and Vial, J. P. Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory, 7:201-221, 1978.
Digital Library
[55]
Nash, J. Non-cooperative games. Annals of Mathematics, 54(2):286-295, 1951.
[56]
Nemirovskij, A. S. and Yudin, D. B. Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, 1983.
[57]
Oliehoek, F. A. and Amato, C. A Concise Introduction to Decentralized POMDPs. Springer, 2016.
[58]
Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., de Boer, V., Muller, P., Connor, J. T., Burch, N., Anthony, T., McAleer, S., Elie, R., Cen, S. H., Wang, Z., Gruslys, A., Malysheva, A., Khan, M., Ozair, S., Timbers, F., Pohlen, T., Eccles, T., Rowland, M., Lanctot, M., Lespiau, J.-B., Piot, B., Omidshafiei, S., Lockhart, E., Sifre, L., Beauguerlange, N., Munos, R., Silver, D., Singh, S., Hassabis, D., and Tuyls, K. Mastering the game of Stratego with model-free multiagent reinforcement learning. Science, 378(6623):990-996, 2022.
[59]
Rabin, M. Incorporating fairness into game theory and economics. The American Economic Review, 83(5):1281-1302, 1993.
[60]
Radhakrishnan, A., Belkin, M., and Uhler, C. Linear convergence of generalized mirror descent with time-dependent mirrors. arXiv preprint arXiv:2009.08574, 2020.
[61]
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., and Whiteson, S. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML, pp. 4295-4304, 2018.
[62]
Rashid, T., Farquhar, G., Peng, B., and Whiteson, S. Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In NeurIPS, pp. 10199-10210, 2020.
[63]
Rizk, Y., Awad, M., and Tunstel, E. W. Cooperative heterogeneous multi-robot systems: A survey. ACM Computing Surveys, 52(2):1-31, 2019.
Digital Library
[64]
Ross, S. M. Goofspiel-the game of pure strategy. Journal of Applied Probability, 8(3):621-625, 1971.
[65]
Samvelyan, M., Rashid, T., De Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G., Hung, C.-M., Torr, P. H., Foerster, J., and Whiteson, S. The StarCraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
[66]
Schmid, M., Moravcik, M., Burch, N., Kadlec, R., Davidson, J., Waugh, K., Bard, N., Timbers, F., Lanctot, M., Zacharias Holland, G., Davoodi, E., Christianson, A., and Bowling, M. Student of games: A unified learning algorithm for both perfect and imperfect information games. Science Advances, 9(46):eadg3256, 2023.
[67]
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[68]
Serrino, J., Kleiman-Weiner, M., Parkes, D. C., and Tenenbaum, J. Finding friend and foe in multi-agent games. In NeurIPS, pp. 1251-1261, 2019.
[69]
Shoham, Y. and Leyton-Brown, K. Multiagent Systems: Algorithmic, Game-theoretic, and Logical Foundations. Cambridge University Press, 2008.
[70]
Siahkamari, A., XIA, X., Saligrama, V., Castanon, D., and Kulis, B. Learning to approximate a Bregman divergence. In NeurIPS, pp. 3603-3612, 2020.
[71]
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., and Hassabis, D. Mastering the game of Go without human knowledge. Nature, 550(7676):354-359, 2017.
[72]
Singh, B., Kumar, R., and Singh, V. P. Reinforcement learning in robotic applications: A comprehensive survey. Artificial Intelligence Review, 55(2):945-990, 2022.
Digital Library
[73]
Sokota, S., Lockhart, E., Timbers, F., Davoodi, E., D'Orazio, R., Burch, N., Schmid, M., Bowling, M., and Lanctot, M. Solving common-payoff games with approximate policy iteration. In AAAI, pp. 9695-9703, 2021.
[74]
Sokota, S., D'Orazio, R., Kolter, J. Z., Loizou, N., Lanctot, M., Mitliagkas, I., Brown, N., and Kroer, C. A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. In ICLR, 2023.
[75]
Son, K., Kim, D., Kang, W. J., Hostallero, D. E., and Yi, Y. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In ICML, pp. 5887-5896, 2019.
[76]
Song, X., Gao, W., Yang, Y., Choromanski, K., Pacchiano, A., and Tang, Y. ES-MAML: Simple Hessian-free meta learning. In ICLR, 2020.
[77]
Su, H., Zhong, Y. D., Dey, B., and Chakraborty, A. Emvlight: A decentralized reinforcement learning framework for efficient passage of emergency vehicles. In AAAI, pp. 4593-4601, 2022.
[78]
Sun, M., Devlin, S., Beck, J., Hofmann, K., and Whiteson, S. Trust region bounds for decentralized PPO under nonstationarity. In AAMAS, pp. 5-13, 2023a.
[79]
Sun, S., Wang, R., and An, B. Reinforcement learning for quantitative trading. ACM Transactions on Intelligent Systems and Technology, 14(3):1-29, 2023b.
Digital Library
[80]
Sutton, R. S. and Barto, A. G. Reinforcement Learning: An Introduction. MIT press, 2018.
Digital Library
[81]
Tammelin, O. Solving large imperfect information games using CFR+. arXiv preprint arXiv:1407.5042, 2014.
[82]
Tesauro, G. et al. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58-68, 1995.
Digital Library
[83]
Tomar, M., Shani, L., Efroni, Y., and Ghavamzadeh, M. Mirror descent policy optimization. In ICLR, 2022.
[84]
Tsai, Y.-Y., Chen, P.-Y., and Ho, T.-Y. Transfer learning without knowing: Reprogramming black-box machine learning models with scarce data and limited resources. In ICML, pp. 9614-9624, 2020.
[85]
v. Neumann, J. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295-320, 1928.
[86]
Vural, N. M., Yu, L., Balasubramanian, K., Volgushev, S., and Erdogdu, M. A. Mirror descent strikes again: Optimal stochastic convex optimization under infinite noise variance. In COLT, pp. 65-102, 2022.
[87]
Wang, J., Ren, Z., Liu, T., Yu, Y., and Zhang, C. QPLEX: Duplex dueling multi-agent Q-learning. In ICLR, 2021a.
[88]
Wang, J., Xu, W., Gu, Y., Song, W., and Green, T. C. Multiagent reinforcement learning for active voltage control on power distribution networks. In NeurIPS, pp. 3271-3284, 2021b.
[89]
Wang, T. and Kaneko, T. Application of deep reinforcement learning in werewolf game agents. In TAAI, pp. 28-33, 2018.
[90]
Wang, X., Guo, W., Su, J., Yang, X., and Yan, J. ZARTS: On zero-order optimization for neural architecture search. In NeurIPS, pp. 12868-12880, 2022.
[91]
Wibisono, A., Tao, M., and Piliouras, G. Alternating mirror descent for constrained min-max games. In NeurIPS, pp. 35201-35212, 2022.
[92]
Xu, B., Wang, Y., Wang, Z., Jia, H., and Lu, Z. Hierarchically and cooperatively learning traffic signal control. In AAAI, pp. 669-677, 2021.
[93]
Xu, Z., Liang, Y., Yu, C., Wang, Y., and Wu, Y. Fictitious cross-play: Learning global Nash equilibrium in mixed cooperative-competitive games. In AAMAS, pp. 1053-1061, 2023.
[94]
Ypma, T. J. Historical development of the Newton-Raphson method. SIAM Review, 37(4):531-551, 1995.
Digital Library
[95]
Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A., and Wu, Y. The surprising effectiveness of PPO in cooperative multi-agent games. In NeurIPS Datasets and Benchmarks Track, pp. 24611-24624, 2022.
[96]
Zahavy, T., Xu, Z., Veeriah, V., Hessel, M., Oh, J., van Hasselt, H., Silver, D., and Singh, S. A self-tuning actorcritic algorithm. In NeurIPS, pp. 20913-20924, 2020.
[97]
Zhou, Z., Mertikopoulos, P., Athey, S., Bambos, N., Glynn, P. W., and Ye, Y. Learning in games with lossy feedback. In NeurIPS, pp. 5134-5144, 2018.
[98]
Zinkevich, M., Johanson, M., Bowling, M., and Piccione, C. Regret minimization in games with incomplete information. In NeurIPS, pp. 1729-1736, 2007.
Index Terms
Configurable mirror descent: towards a unification of decision making
Computing methodologies
Artificial intelligence
Distributed artificial intelligence
Multi-agent systems
Machine learning
Learning paradigms
Supervised learning
Supervised learning by regression
Learning settings
Online learning settings
Machine learning approaches
Theory of computation
Design and analysis of algorithms
Mathematical optimization
Continuous optimization
Nonconvex optimization
Theory and algorithms for application domains
Algorithmic game theory and mechanism design
Algorithmic game theory
Index terms have been assigned to the content through auto-classification.
Recommendations
- Banker online mirror descent: a universal approach for delayed online bandit learning
ICML'23: Proceedings of the 40th International Conference on Machine Learning
We propose Banker Online Mirror Descent (Banker-OMD), a novel framework generalizing the classical Online Mirror Descent (OMD) technique in the online learning literature. The Banker-OMD framework almost completely decouples feedback delay handling and ...
Read More
- The Information Geometry of Mirror Descent
We prove the equivalence of two online learning algorithms: 1) mirror descent and 2) natural gradient descent. Both mirror descent and natural gradient descent are generalizations of online gradient descent when the parameter of interest lies on a non-...
Read More
- Metrical task systems on trees via mirror descent and unfair gluing
SODA '19: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms
We consider metrical task systems on tree metrics, and present an O(depthxlog n)-competitive randomized algorithm based on the mirror descent framework introduced in our prior work on the k-server problem. For the special case of hierarchically ...
Read More
Comments
Information & Contributors
Information
Published In
ICML'24: Proceedings of the 41st International Conference on Machine Learning
July 2024
63010 pages
- Editors:
- Ruslan Salakhutdinov,
- Zico Kolter,
- Katherine Heller,
- Adrian Weller,
- Nuria Oliver,
- Jonathan Scarlett,
- Felix Berkenkamp
Copyright © 2024.
Publisher
JMLR.org
Publication History
Published: 03 January 2025
Qualifiers
- Research-article
- Research
- Refereed limited
Acceptance Rates
Overall Acceptance Rate 140 of 548 submissions, 26%
Contributors
Other Metrics
View Article Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025
Other Metrics
View Author Metrics