Hey!
Reinforcement learning (RL) is a type of machine learning where an “agent” learns to make decisions by trial and error. The agent needs to balance trying new things (exploring) with performing in ways that have worked in the past (exploiting). If all the agent does is explore, then it will never use the knowledge it has learned to succeed. If all the agent does is exploit, then it may never learn there is a better way to triumph.
Explore vs exploit is a mental model that we can apply to our daily lives. For example,
- Reading new books vs re-reading a book you enjoyed
- Trying new career paths vs focusing on the one you have
- Trying new foods at a restaurant vs ordering your favorite meals
The common approach to applying this mental model is to begin by exploring a lot and then gradually adding exploitation as you learn more. At the start, you don’t know anything, so you need to fix that as fast as possible by exploring. Once you get your bearings, you can start exploiting some of what you’ve learned. As time goes on, you will become confident that you’ve understood the optimal actions to take and can shift the majority of your focus onto exploiting.
WARNING: Beware Changes to the Environment
In reinforcement learning, the agent exists in an environment, but this environment can change over time. If we eventually feel like we’ve explored enough so that all we do is exploit, we may eventually find ourselves exploiting outdated techniques. Because of this, I think you should never completely stop exploring. Also, it’s fun!
I hope you found this newsletter thought-provoking!
Announcements
- I hit 500 subscribers on my YouTube channel!
- Some new members joined the discord channel
Michael Hammer
Read all my newsletters here: https://michaelphammer.com/newsletter/