Department of Computer Science | Institute of Theoretical Computer Science | CADMO
Prof. Emo Welzl and Prof. Bernd Gärtner
Mittagsseminar Talk Information |
Date and Time: Tuesday, September 20, 2022, 12:15 pm
Duration: 30 minutes
Location: OAT S15/S16/S17
Speaker: Maxime Larcher
In the 2-Armed Bandit Problem, an agent faces two slots machines. At round 1, 2, ..., T the agent pulls the arm of their choice and receives a random reward, sampled according to the (hidden) distribution of that arm. Naturally, the goal of the agent is to minimise the total regret, i.e. the total expected missed reward over the T rounds.
When the reward distribution is allowed to change up to L times over the T rounds (without the agent knowing when such changes happen), it was shown by Auer et al. '02 that no algorithm can achieve regret better than Ω((LT)1/2). In 2019, Auer et al. used Azuma's inequality to bound the probability of 'bad events' and presented an algorithm achieving regret O((LT log T)1/2).
We present a new algorithm based on random walks and which achieves regret O((LT)1/2) when L is known and O((LT log L)1/2) when L is unknown. In particular, our algorithm is optimal when L is known. We also obtain improved bounds for the general K-Armed bandit for a wide range of K.
Upcoming talks | All previous talks | Talks by speaker | Upcoming talks in iCal format (beta version!)
Previous talks by year: 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996
Information for students and suggested topics for student talks
Automatic MiSe System Software Version 1.4803M | admin login