Configurable mirror descent | Proceedings of the 41st International Conference on Machine Learning (2025)

research-article

AUTHORs: Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, + 4, Shuyue Hu, Xiao Huang, Hau Chan, Bo An (Less)

ICML'24: Proceedings of the 41st International Conference on Machine Learning

Article No.: 1130, Pages 28146 - 28203

Published: 03 January 2025 Publication History

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

Manage my Alerts

New Citation Alert!

Please log in to your account

Publisher Site

  • ICML'24: Proceedings of the 41st International Conference on Machine Learning

    Configurable mirror descent: towards a unification of decision making

    Pages 28146 - 28203

    PREVIOUS CHAPTERLearning shadow variable representation for treatment effect estimation under collider biasPreviousNEXT CHAPTEREnhancing class-imbalanced learning with pre-trained guidance through class-conditional knowledge distillationNext
    • Abstract
    • References
    • View Options
    • References
    • Media
    • Tables
    • Share

Abstract

Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Although various methods have been proposed to address the specific decision-making categories, these methods typically evolve independently and cannot generalize to other categories. Therefore, a fundamental question for decision-making is: Can we develop a single algorithm to tackle ALL categories of decision-making problems? There are several main challenges to address this question: i) different categories involve different numbers of agents and different relationships between agents, ii) different categories have different solution concepts and evaluation measures, and iii) there lacks a comprehensive benchmark covering all the categories. This work presents a preliminary attempt to address the question with three main contributions. i) We propose the generalized mirror descent (GMD), a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences. ii) We propose the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyperparameters in GMD conditional on the evaluation measures. iii) We construct the GAMEBENCH with 15 academic-friendly games across different decision-making categories. Extensive experiments demonstrate that CMD achieves empirically competitive or better outcomes compared to baselines while providing the capability of exploring diverse dimensions of decision making.

References

[1]

Albrecht, J., Fetterman, A., Fogelman, B., Kitanidis, E., Wroblewski, B., Seo, N., Rosenthal, M., Knutins, M., Polizzi, Z., Simon, J., and Qiu, K. Avalon: A benchmark for RL generalization using procedurally generated worlds. In NeurIPS Datasets and Benchmarks Track, pp. 12813-12825, 2022.

Google Scholar

[2]

Amos, B., Xu, L., and Kolter, J. Z. Input convex neural networks. In ICML, pp. 146-155, 2017.

Google Scholar

[3]

Anagnostides, I., Farina, G., Panageas, I., and Sandholm, T. Optimistic mirror descent either converges to Nash or to strong coarse correlated equilibria in bimatrix games. In NeurIPS, pp. 16439-16454, 2022a.

Google Scholar

[4]

Anagnostides, I., Panageas, I., Farina, G., and Sandholm, T. On last-iterate convergence beyond zero-sum games. In ICML, pp. 536-581, 2022b.

Google Scholar

[5]

Ao, R., Cen, S., and Chi, Y. Asynchronous gradient play in zero-sum multi-agent games. In ICLR, 2023.

Google Scholar

[6]

Aumann, R. J. Correlated equilibrium as an expression of Bayesian rationality. Econometrica: Journal of the Econometric Society, 55(1):1-18, 1987.

Crossref

Google Scholar

[7]

Bailey, J. P. and Piliouras, G. Multiplicative weights update in zero-sum games. In EC, pp. 321-338, 2018.

Digital Library

Google Scholar

[8]

Bailey, J. P. and Piliouras, G. Fast and furious learning in zero-sum games: Vanishing regret with nonvanishing step sizes. In NeurIPS, pp. 12997-13007, 2019.

Google Scholar

[9]

Bakhtin, A., Brown, N., Dinan, E., Farina, G., Flaherty, C., Fried, D., Goff, A., Gray, J., Hu, H., Jacob, A. P., Komeili, M., Konath, K., Kwon, M., Lerer, A., Lewis, M., Miller, A. H., Mitts, S., Renduchintala, A., Roller, S., Rowe, D., Shi, W., Spisak, J., Wei, A., Wu, D., Zhang, H., and Zijlstra, M. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067-1074, 2022.

Crossref

Google Scholar

[10]

Bard, N., Foerster, J. N., Chandar, S., Burch, N., Lanctot, M., Song, H. F., Parisotto, E., Dumoulin, V., Moitra, S., Hughes, E., Dunning, I., Mourad, S., Larochelle, H., Bellemare, M. G., and Bowling, M. The Hanabi challenge: A new frontier for AI research. Artificial Intelligence, 280:103216, 2020.

Digital Library

Google Scholar

[11]

Beck, A. and Teboulle, M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167-175, 2003.

Digital Library

Google Scholar

[12]

Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279, 2013.

Crossref

Google Scholar

[13]

Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Jozefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Pinto, H. P. d. O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., and Zhang, S. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.

Google Scholar

[14]

Boyd, S. P. and Vandenberghe, L. Convex Optimization. Cambridge University Press, 2004.

Crossref

Google Scholar

[15]

Brown, N. and Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418-424, 2018.

Crossref

Google Scholar

[16]

Brown, N. and Sandholm, T. Superhuman AI for multiplayer poker. Science, 365(6456):885-890, 2019.

Crossref

Google Scholar

[17]

Campbell, M., Hoane Jr, A. J., and Hsu, F.-h. Deep Blue. Artificial Intelligence, 134(1-2):57-83, 2002.

Digital Library

Google Scholar

[18]

Carminati, L., Zhang, B. H., Farina, G., Gatti, N., and Sandholm, T. Hidden-role games: Equilibrium concepts and computation. arXiv preprint arXiv:2308.16017, 2023.

Google Scholar

[19]

Cen, S., Chi, Y., Du, S. S., and Xiao, L. Faster last-iterate convergence of policy optimization in zero-sum Markov games. In ICLR, 2023.

Google Scholar

[20]

Chen, Y., Song, X., Lee, C., Wang, Z., Zhang, R., Dohan, D., Kawakami, K., Kochanski, G., Doucet, A., Ranzato, M., et al. Towards learning universal hyperparameter optimizers with transformers. In NeurIPS, pp. 32053-32068, 2022.

Google Scholar

[21]

Cilingir, H. K., Manzelli, R., and Kulis, B. Deep divergence learning. In ICML, pp. 2027-2037, 2020.

Google Scholar

[22]

Davis, O. A. and Whinston, A. Externalities, welfare, and the theory of games. Journal of Political Economy, 70(3): 241-262, 1962.

Crossref

Google Scholar

[23]See AlsoDescending Necrotizing Mediastinitis Caused by Streptococcus pyogenes in a Child with Primary Epstein–Barr Virus InfectionThe Effects of the L / N‐Type Calcium Channel Blocker (Cilnidipine) on Sympathetic Hyperactive Morning Hypertension: Results From ACHIEVE‐ONE

de Witt, C. S., Gupta, T., Makoviichuk, D., Makoviychuk, V., Torr, P. H., Sun, M., and Whiteson, S. Is independent learning all you need in the StarCraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.

Google Scholar

[24]

Fang, Y., Tang, Z., Ren, K., Liu, W., Zhao, L., Bian, J., Li, D., Zhang, W., Yu, Y., and Liu, T.-Y. Learning multiagent intention-aware communication for optimal multiorder execution in finance. In KDD, pp. 4003-4012, 2023.

Google Scholar

[25]

Farina, G., Bianchi, T., and Sandholm, T. Coarse correlation in extensive-form games. In AAAI, pp. 1934-1941, 2020.

Crossref

Google Scholar

[26]

Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual multi-agent policy gradients. In AAAI, pp. 2974-2982, 2018.

Google Scholar

[27]

Foerster, J., Song, F., Hughes, E., Burch, N., Dunning, I., Whiteson, S., Botvinick, M., and Bowling, M. Bayesian action decoder for deep multi-agent reinforcement learning. In ICML, pp. 1942-1951, 2019.

Google Scholar

[28]

Golowich, N., Pattathil, S., and Daskalakis, C. Tight lastiterate convergence rates for no-regret learning in multiplayer games. In NeurIPS, pp. 20766-20778, 2020.

Google Scholar

[29]

Harsanyi, J. C., Selten, R., et al. A General Theory of Equilibrium Selection in Games. The MIT Press, 1988.

Google Scholar

[30]

Hong, S., Zheng, X., Chen, J., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S. K. S., Lin, Z., Zhou, L., Ran, C., Xiao, L., Wu, C., and Schmidhuber, J. MetaGPT: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.

Google Scholar

[31]

Hsieh, Y.-G., Antonakopoulos, K., and Mertikopoulos, P. Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. In COLT, pp. 2388-2422, 2021.

Google Scholar

[32]

Ilyas, A., Engstrom, L., Athalye, A., and Lin, J. Black-box adversarial attacks with limited queries and information. In ICML, pp. 2137-2146, 2018.

Google Scholar

[33]

Jain, R., Piliouras, G., and Sim, R. Matrix multiplicative weights updates in quantum zero-sum games: Conservation laws & recurrence. In NeurIPS, pp. 4123-4135, 2022.

Google Scholar

[34]

Kangarshahi, E. A., Hsieh, Y.-P., Sahin, M. F., and Cevher, V. Let's be honest: An optimal no-regret framework for zero-sum games. In ICML, pp. 2488-2496, 2018.

Google Scholar

[35]

Kozuno, T., Menard, P., Munos, R., and Valko, M. Modelfree learning for two-player zero-sum partially observable Markov games with perfect recall. In NeurIPS, pp. 11987-11998, 2021.

Google Scholar

[36]

Kuhn, H. W. A simplified two-person poker. Contributions to the Theory of Games, 1:97-103, 1950.

Google Scholar

[37]

Kurach, K., Raichuk, A., Stanćzyk, P., Zajač, M., Bachem, O., Espeholt, L., Riquelme, C., Vincent, D., Michalski, M., Bousquet, O., and Gelly, S. Google research football: A novel reinforcement learning environment. In AAAI, pp. 4501-4510, 2020.

Crossref

Google Scholar

[38]

Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Perolat, J., Silver, D., and Graepel, T. A unified game-theoretic approach to multiagent reinforcement learning. In NeurIPS, pp. 4190-4203, 2017.

Google Scholar

[39]

Lanctot, M., Lockhart, E., Lespiau, J.-B., Zambaldi, V., Upadhyay, S., Pérolat, J., Srinivasan, S., Timbers, F., Tuyls, K., Omidshafiei, S., Hennes, D., Morrill, D., Muller, P., Ewalds, T., Faulkner, R., Kramár, J., De Vylder, B., Saeta, B., Bradbury, J., Ding, D., Borgeaud, S., Lai, M., Schrittwieser, J., Anthony, T., Hughes, E., Danihelka, I., and Ryan-Davis, J. OpenSpiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453, 2019.

Google Scholar

[40]

Lanctot, M., Schultz, J., Burch, N., Smith, M. O., Hennes, D., Anthony, T., and Perolat, J. Population-based evaluation in repeated rock-paper-scissors as a benchmark for multiagent reinforcement learning. TMLR, 2023. ISSN 2835-8856.

Google Scholar

[41]

Lee, C.-W., Kroer, C., and Luo, H. Last-iterate convergence in extensive-form games. In NeurIPS, pp. 14293-14305, 2021.

Google Scholar

[42]

Lewis, M., Yarats, D., Dauphin, Y., Parikh, D., and Batra, D. Deal or no deal? End-to-end learning of negotiation dialogues. In EMNLP, pp. 2443-2453, 2017.

Crossref

Google Scholar

[43]

Lindauer, M., Eggensperger, K., Feurer, M., Biedenkapp, A., Deng, D., Benjamins, C., Ruhkopf, T., Sass, R., and Hutter, F. SMAC3: A versatile Bayesian optimization package for hyperparameter optimization. The Journal of Machine Learning Research, 23(1):2475-2483, 2022.

Digital Library

Google Scholar

[44]

Liu, M., Ozdaglar, A. E., Yu, T., and Zhang, K. The power of regularization in solving extensive-form games. In ICLR, 2023.

Google Scholar

[45]

Liu, S., Chen, P.-Y., Kailkhura, B., Zhang, G., Hero III, A. O., and Varshney, P. K. A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications. IEEE Signal Processing Magazine, 37(5):43-54, 2020.

Crossref

Google Scholar

[46]

Liu, S., Lever, G., Wang, Z., Merel, J., Eslami, S. A., Hennes, D., Czarnecki, W. M., Tassa, Y., Omidshafiei, S., Abdolmaleki, A., et al. From motor control to team play in simulated humanoid football. Science Robotics, 7(69): eabo0235, 2022a.

Google Scholar

[47]

Liu, W., Jiang, H., Li, B., and Li, H. Equivalence analysis between counterfactual regret minimization and online mirror descent. In ICML, pp. 13717-13745, 2022b.

Google Scholar

[48]

Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., and Mordatch, I. Multi-agent actor-critic for mixed cooperativecompetitive environments. In NeurIPS, pp. 6382-6393, 2017.

Google Scholar

[49]

Lu, F., Raff, E., and Ferraro, F. Neural Bregman divergences for distance learning. In ICLR, 2023.

Google Scholar

[50]

Mao, S., Cai, Y., Xia, Y., Wu, W., Wang, X., Wang, F., Ge, T., and Wei, F. Alympics: Language agents meet game theory - exploring strategic decision-making with AI agents. arXiv preprint arXiv:2311.03220, 2023.

Google Scholar

[51]

Marris, L., Muller, P., Lanctot, M., Tuyls, K., and Graepel, T. Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers. In ICML, pp. 7480-7491, 2021.

Google Scholar

[52]

Mertikopoulos, P., Lecouat, B., Zenati, H., Foo, C.-S., Chandrasekhar, V., and Piliouras, G. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In ICLR, 2019.

Google Scholar

[53]

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, 2015.

Crossref

Google Scholar

[54]

Moulin, H. and Vial, J. P. Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory, 7:201-221, 1978.

Digital Library

Google Scholar

[55]

Nash, J. Non-cooperative games. Annals of Mathematics, 54(2):286-295, 1951.

Crossref

Google Scholar

[56]

Nemirovskij, A. S. and Yudin, D. B. Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, 1983.

Google Scholar

[57]

Oliehoek, F. A. and Amato, C. A Concise Introduction to Decentralized POMDPs. Springer, 2016.

Crossref

Google Scholar

[58]

Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., de Boer, V., Muller, P., Connor, J. T., Burch, N., Anthony, T., McAleer, S., Elie, R., Cen, S. H., Wang, Z., Gruslys, A., Malysheva, A., Khan, M., Ozair, S., Timbers, F., Pohlen, T., Eccles, T., Rowland, M., Lanctot, M., Lespiau, J.-B., Piot, B., Omidshafiei, S., Lockhart, E., Sifre, L., Beauguerlange, N., Munos, R., Silver, D., Singh, S., Hassabis, D., and Tuyls, K. Mastering the game of Stratego with model-free multiagent reinforcement learning. Science, 378(6623):990-996, 2022.

Crossref

Google Scholar

[59]

Rabin, M. Incorporating fairness into game theory and economics. The American Economic Review, 83(5):1281-1302, 1993.

Google Scholar

[60]

Radhakrishnan, A., Belkin, M., and Uhler, C. Linear convergence of generalized mirror descent with time-dependent mirrors. arXiv preprint arXiv:2009.08574, 2020.

Google Scholar

[61]

Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., and Whiteson, S. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML, pp. 4295-4304, 2018.

Google Scholar

[62]

Rashid, T., Farquhar, G., Peng, B., and Whiteson, S. Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In NeurIPS, pp. 10199-10210, 2020.

Google Scholar

[63]

Rizk, Y., Awad, M., and Tunstel, E. W. Cooperative heterogeneous multi-robot systems: A survey. ACM Computing Surveys, 52(2):1-31, 2019.

Digital Library

Google Scholar

[64]

Ross, S. M. Goofspiel-the game of pure strategy. Journal of Applied Probability, 8(3):621-625, 1971.

Crossref

Google Scholar

[65]

Samvelyan, M., Rashid, T., De Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G., Hung, C.-M., Torr, P. H., Foerster, J., and Whiteson, S. The StarCraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.

Google Scholar

[66]

Schmid, M., Moravcik, M., Burch, N., Kadlec, R., Davidson, J., Waugh, K., Bard, N., Timbers, F., Lanctot, M., Zacharias Holland, G., Davoodi, E., Christianson, A., and Bowling, M. Student of games: A unified learning algorithm for both perfect and imperfect information games. Science Advances, 9(46):eadg3256, 2023.

Google Scholar

[67]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

Google Scholar

[68]

Serrino, J., Kleiman-Weiner, M., Parkes, D. C., and Tenenbaum, J. Finding friend and foe in multi-agent games. In NeurIPS, pp. 1251-1261, 2019.

Google Scholar

[69]

Shoham, Y. and Leyton-Brown, K. Multiagent Systems: Algorithmic, Game-theoretic, and Logical Foundations. Cambridge University Press, 2008.

Crossref

Google Scholar

[70]

Siahkamari, A., XIA, X., Saligrama, V., Castanon, D., and Kulis, B. Learning to approximate a Bregman divergence. In NeurIPS, pp. 3603-3612, 2020.

Google Scholar

[71]

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., and Hassabis, D. Mastering the game of Go without human knowledge. Nature, 550(7676):354-359, 2017.

Crossref

Google Scholar

[72]

Singh, B., Kumar, R., and Singh, V. P. Reinforcement learning in robotic applications: A comprehensive survey. Artificial Intelligence Review, 55(2):945-990, 2022.

Digital Library

Google Scholar

[73]

Sokota, S., Lockhart, E., Timbers, F., Davoodi, E., D'Orazio, R., Burch, N., Schmid, M., Bowling, M., and Lanctot, M. Solving common-payoff games with approximate policy iteration. In AAAI, pp. 9695-9703, 2021.

Crossref

Google Scholar

[74]

Sokota, S., D'Orazio, R., Kolter, J. Z., Loizou, N., Lanctot, M., Mitliagkas, I., Brown, N., and Kroer, C. A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. In ICLR, 2023.

Google Scholar

[75]

Son, K., Kim, D., Kang, W. J., Hostallero, D. E., and Yi, Y. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In ICML, pp. 5887-5896, 2019.

Google Scholar

[76]

Song, X., Gao, W., Yang, Y., Choromanski, K., Pacchiano, A., and Tang, Y. ES-MAML: Simple Hessian-free meta learning. In ICLR, 2020.

Google Scholar

[77]

Su, H., Zhong, Y. D., Dey, B., and Chakraborty, A. Emvlight: A decentralized reinforcement learning framework for efficient passage of emergency vehicles. In AAAI, pp. 4593-4601, 2022.

Crossref

Google Scholar

[78]

Sun, M., Devlin, S., Beck, J., Hofmann, K., and Whiteson, S. Trust region bounds for decentralized PPO under nonstationarity. In AAMAS, pp. 5-13, 2023a.

Google Scholar

[79]

Sun, S., Wang, R., and An, B. Reinforcement learning for quantitative trading. ACM Transactions on Intelligent Systems and Technology, 14(3):1-29, 2023b.

Digital Library

Google Scholar

[80]

Sutton, R. S. and Barto, A. G. Reinforcement Learning: An Introduction. MIT press, 2018.

Digital Library

Google Scholar

[81]

Tammelin, O. Solving large imperfect information games using CFR+. arXiv preprint arXiv:1407.5042, 2014.

Google Scholar

[82]

Tesauro, G. et al. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58-68, 1995.

Digital Library

Google Scholar

[83]

Tomar, M., Shani, L., Efroni, Y., and Ghavamzadeh, M. Mirror descent policy optimization. In ICLR, 2022.

Google Scholar

[84]

Tsai, Y.-Y., Chen, P.-Y., and Ho, T.-Y. Transfer learning without knowing: Reprogramming black-box machine learning models with scarce data and limited resources. In ICML, pp. 9614-9624, 2020.

Google Scholar

[85]

v. Neumann, J. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295-320, 1928.

Crossref

Google Scholar

[86]

Vural, N. M., Yu, L., Balasubramanian, K., Volgushev, S., and Erdogdu, M. A. Mirror descent strikes again: Optimal stochastic convex optimization under infinite noise variance. In COLT, pp. 65-102, 2022.

Google Scholar

[87]

Wang, J., Ren, Z., Liu, T., Yu, Y., and Zhang, C. QPLEX: Duplex dueling multi-agent Q-learning. In ICLR, 2021a.

Google Scholar

[88]

Wang, J., Xu, W., Gu, Y., Song, W., and Green, T. C. Multiagent reinforcement learning for active voltage control on power distribution networks. In NeurIPS, pp. 3271-3284, 2021b.

Google Scholar

[89]

Wang, T. and Kaneko, T. Application of deep reinforcement learning in werewolf game agents. In TAAI, pp. 28-33, 2018.

Google Scholar

[90]

Wang, X., Guo, W., Su, J., Yang, X., and Yan, J. ZARTS: On zero-order optimization for neural architecture search. In NeurIPS, pp. 12868-12880, 2022.

Google Scholar

[91]

Wibisono, A., Tao, M., and Piliouras, G. Alternating mirror descent for constrained min-max games. In NeurIPS, pp. 35201-35212, 2022.

Google Scholar

[92]

Xu, B., Wang, Y., Wang, Z., Jia, H., and Lu, Z. Hierarchically and cooperatively learning traffic signal control. In AAAI, pp. 669-677, 2021.

Crossref

Google Scholar

[93]

Xu, Z., Liang, Y., Yu, C., Wang, Y., and Wu, Y. Fictitious cross-play: Learning global Nash equilibrium in mixed cooperative-competitive games. In AAMAS, pp. 1053-1061, 2023.

Google Scholar

[94]

Ypma, T. J. Historical development of the Newton-Raphson method. SIAM Review, 37(4):531-551, 1995.

Digital Library

Google Scholar

[95]

Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A., and Wu, Y. The surprising effectiveness of PPO in cooperative multi-agent games. In NeurIPS Datasets and Benchmarks Track, pp. 24611-24624, 2022.

Google Scholar

[96]

Zahavy, T., Xu, Z., Veeriah, V., Hessel, M., Oh, J., van Hasselt, H., Silver, D., and Singh, S. A self-tuning actorcritic algorithm. In NeurIPS, pp. 20913-20924, 2020.

Google Scholar

[97]

Zhou, Z., Mertikopoulos, P., Athey, S., Bambos, N., Glynn, P. W., and Ye, Y. Learning in games with lossy feedback. In NeurIPS, pp. 5134-5144, 2018.

Google Scholar

[98]

Zinkevich, M., Johanson, M., Bowling, M., and Piccione, C. Regret minimization in games with incomplete information. In NeurIPS, pp. 1729-1736, 2007.

Google Scholar

Index Terms

  1. Configurable mirror descent: towards a unification of decision making

    1. Computing methodologies

      1. Artificial intelligence

        1. Distributed artificial intelligence

          1. Multi-agent systems

        2. Machine learning

          1. Learning paradigms

            1. Supervised learning

              1. Supervised learning by regression

            2. Learning settings

              1. Online learning settings

              2. Machine learning approaches

            3. Theory of computation

              1. Design and analysis of algorithms

                1. Mathematical optimization

                  1. Continuous optimization

                    1. Nonconvex optimization

                2. Theory and algorithms for application domains

                  1. Algorithmic game theory and mechanism design

                    1. Algorithmic game theory

              Index terms have been assigned to the content through auto-classification.

              Recommendations

              • Banker online mirror descent: a universal approach for delayed online bandit learning

                ICML'23: Proceedings of the 40th International Conference on Machine Learning

                We propose Banker Online Mirror Descent (Banker-OMD), a novel framework generalizing the classical Online Mirror Descent (OMD) technique in the online learning literature. The Banker-OMD framework almost completely decouples feedback delay handling and ...

                Read More

              • The Information Geometry of Mirror Descent

                We prove the equivalence of two online learning algorithms: 1) mirror descent and 2) natural gradient descent. Both mirror descent and natural gradient descent are generalizations of online gradient descent when the parameter of interest lies on a non-...

                Read More

              • Metrical task systems on trees via mirror descent and unfair gluing

                SODA '19: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms

                We consider metrical task systems on tree metrics, and present an O(depthxlog n)-competitive randomized algorithm based on the mirror descent framework introduced in our prior work on the k-server problem. For the special case of hierarchically ...

                Read More

              Comments

              Information & Contributors

              Information

              Published In

              Configurable mirror descent | Proceedings of the 41st International Conference on Machine Learning (1)

              ICML'24: Proceedings of the 41st International Conference on Machine Learning

              July 2024

              63010 pages

              • Editors:
              • Ruslan Salakhutdinov,
              • Zico Kolter,
              • Katherine Heller,
              • Adrian Weller,
              • Nuria Oliver,
              • Jonathan Scarlett,
              • Felix Berkenkamp

              Copyright © 2024.

              Publisher

              JMLR.org

              Publication History

              Published: 03 January 2025

              Qualifiers

              • Research-article
              • Research
              • Refereed limited

              Acceptance Rates

              Overall Acceptance Rate 140 of 548 submissions, 26%

              Contributors

              Configurable mirror descent | Proceedings of the 41st International Conference on Machine Learning (2)

              Other Metrics

              View Article Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • Total Citations

              • Total Downloads

              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0

              Reflects downloads up to 03 Jan 2025

              Other Metrics

              View Author Metrics

              Citations

              View Options

              View options

              Media

              Figures

              Other

              Tables

              推荐阅读