Reinforcement Learning Spring 2022
Graduate course, National Kaohsiung University of Science and Technology, 2022
Course Description
Reinforcement Learning (RL) is applied to many applications such as dialog system, robotics, and AlphaGo. RL mimics the learning behaviors of human beings and endows the system with the ability to learn through the trial-and-error method. The course will introduce the foundamental theories and the advanced theories of RL. We will start from Markolv decision process to deep RL. Moreover, we will introduce the applications of RL. More specifically, we focus our applications on dialog systems. After this course, students will have foundamental concepts to do RL research.
Lecture 0: AidIR: An Interactive Dialog System to Aid Disease Information Retrieval
Lecture 1: Introduction to Reinforcement Learning
- Slides
- Reading Materials
- Papers related to examples in lecture
- Learning to Drive in a Day
- Deep Reinforcement Learning for Dialogue Generation
- Don’t Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation
- QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
- Human-level control through deep reinforcement learning
Lecture 2: Markov Decision Processes
- Slides
- Reading Materials
Lecture 3: Planning by Dynamic Programming
- Slides
- Reading Materials
Lecture 4: Model-Free Prediction
- Slides
- Reading Materials
- Introduction to Reinforcement Learning with David Silver Lecture4
- Sutton and Barto Chapter 5 and Chapter 6 (Focusing on prediction parts would be fine)
Lecture 5: Model-Free Control
- Slides
- Reading Materials
- Introduction to Reinforcement Learning with David Silver Lecture5
- Sutton and Barto Chapter 5 and Chapter 6 (Focusing on control parts would be fine)
Lecture 6: Neural Network and Backpropagation
- Slides
- Reading Materials
- Stanford University CS224N Lecture3 and Lecture4(2019 Edition)
- Maxout Networks
- Understanding the difficulty of training deep feedforward neural networks
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting
- Adam: A Method for Stochastic Optimization
- An overview of gradient descent optimization algorithms
Lecture 7: Value Function Approximation
- Slides
- Reading Materials
- Introduction to Reinforcement Learning with David Silver Lecture6
- Sutton and Barto Chapter 6 (Focusing on q-learning parts would be fine)
- Human-level control through deep reinforcement learning
Lecture 8: Policy Gradient
- Slides
- Reading Materials
- UC Berkeley CS 285 Deep Reinforcement Learning Lecture 5 and Lecture 9
- Reinforcement learning of motor skills with policy gradients
- Trust Region Policy Optimization
- Proximal Policy Optimization Algorithms
- Lagrange multipliers and constrained optimization
- Finding Taylor polynomial approximations of functions
Lecture 9: Actor-Critic Algorithm
- Slides
- Reading Materials
- UC Berkeley CS 285 Deep Reinforcement Learning Lecture 6
- Asynchronous Methods for Deep Reinforcement Learning
- Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Deep Reinforcement Learning - Advantage Actor-Critic methods
Lecture 10: Variational Inference and Generative Models
- Slides
- Reading Materials
Textbooks
- Deep Learning
- Goodfellow, Bengio, and Courville, Deep Learning
- Zhang et al., Dive into Deep Learning
- Reinforcement Learning
- Sutton and Barto, Reinforcement Learning: An Introduction
- Vitay, Deep Reinforcement Learning
Online Courses
- Reinforcement Learning