Alexander Panin: Variational Information Maximizing Exploration,
When it comes to solving practical problems, performance of reinforcement learning algorithms usually depends highly on efficient environment exploration. However, classical exploration strategies (e-greedy, boltzmann) have several common drawbacks that jeopardize training speed. Informally, if you want to learn to program in java, having already learned python, randomly mistyping 10% of characters (e-greedy) and keeping those that compiled will likely yield poor results. We’d like to describe a method devi
66 views
212
71
2 months ago 00:13:36 1
Пожалуй, главное заблуждение об электричестве [Veritasium]
2 months ago 00:06:43 1
70 и еще одно величайшее достижение человечества [AsapSCIENCE]
2 months ago 00:06:12 1
Наши молекулярные машины [Veritasium]
2 months ago 00:06:07 1
Парадокс «Гранд-отель» Гильберта [Veritasium]
2 months ago 00:46:40 12
Свобода воли не нужна? ❘ Елена Наймарк и Александр Марков [Vert Dider]
2 months ago 00:19:38 1
Как из хаоса рождается порядок? [Veritasium]
2 months ago 00:17:35 1
Как считали число пи? [Veritasium]
2 months ago 00:20:51 1
Самая простая нерешённая задача — гипотеза Коллатца [Veritasium]
2 months ago 00:03:07 1
Как понять принцип неопределённости Гейзенберга? [Veritasium]