UCI AI/ML Seminar Series
Roy Fox
Assistant Professor
Department of Computer Science
University of California, Irvine
Curiously effective ensemble and double-oracle reinforcement-learning methods
Ensemble methods for reinforcement learning have gained attention in recent years, due to their ability to represent model uncertainty and use it to guide exploration and to reduce value estimation bias. We present MeanQ, a very simple ensemble method with improved performance, and show how it reduces estimation variance enough to operate without a stabilizing target network. Curiously, MeanQ is theoretically *almost* equivalent to a non-ensemble state-of-the-art method that it significantly outperforms, raising questions about the interaction between uncertainty estimation, representation, and resampling.
In adversarial environments, where a second agent attempts to minimize the first’s rewards, double-oracle (DO) methods grow a population of policies
5 views
36
8
2 months ago 01:56:18 3
«Недостатки и распределение рисков в аренде: в поисках баланса интересов сторон» лекция А.Карапетова
4 months ago 01:23:53 1
Прямой эфир «LLM в AI Talent Hub»
4 months ago 00:05:59 1
Куда мы пропадали? Новости, обещания и планы.
6 months ago 00:00:00 1
Человеко-машинный разум для проектирования и производства микросхем - Виктор Артюхов — Семинар AGI
8 months ago 01:06:13 1
Limitations of Stochastic Selection with Pairwise Independent Priors
8 months ago 01:00:50 1
[I’ML] ML System Design
11 months ago 01:55:29 50
FractalGPT - Захар Понимаш — Семинар AGI
11 months ago 00:28:41 1
Прикладное машинное обучение. Семинар 5. BERT.
1 year ago 00:49:05 1
High-Dimensional Prediction for Sequential Decision Making