Abstract
We present a representation-driven framework for reinforcement learning. By representing policies as estimates of their expected values, we leverage techniques from contextual bandits to guide exploration and exploitation. Particularly, embedding a policy network into a linear feature space allows us to reframe the exploration-exploitation problem as a representation-exploitation problem, where good policy representations enable optimal exploration. We demonstrate the effectiveness of this framework through its application to evolutionary and policy gradient-based approaches, leading to significantly improved performance compared to traditional methods. Our framework provides a new perspective on reinforcement learning, highlighting the importance of policy representation in determining optimal exploration-exploitation strategies.
| Original language | English |
|---|---|
| Pages (from-to) | 25588-25603 |
| Number of pages | 16 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 202 |
| State | Published - 2023 |
| Event | 40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States Duration: 23 Jul 2023 → 29 Jul 2023 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability