# Reinforcement Learning Spring 2022

Graduate course, *National Kaohsiung University of Science and Technology*, 2022

## Course Description

Reinforcement Learning (RL) is applied to many applications such as dialog system, robotics, and AlphaGo. RL mimics the learning behaviors of human beings and endows the system with the ability to learn through the trial-and-error method. The course will introduce the foundamental theories and the advanced theories of RL. We will start from Markolv decision process to deep RL. Moreover, we will introduce the applications of RL. More specifically, we focus our applications on dialog systems. After this course, students will have foundamental concepts to do RL research.

## Lecture 0: AidIR: An Interactive Dialog System to Aid Disease Information Retrieval

## Lecture 1: Introduction to Reinforcement Learning

- Slides
- Reading Materials
- Papers related to examples in lecture
- Learning to Drive in a Day
- Deep Reinforcement Learning for Dialogue Generation
- Donâ€™t Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation
- QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
- Human-level control through deep reinforcement learning

## Lecture 2: Markov Decision Processes

- Slides
- Reading Materials

## Lecture 3: Planning by Dynamic Programming

- Slides
- Reading Materials

## Lecture 4: Model-Free Prediction

- Slides
- Reading Materials
- Introduction to Reinforcement Learning with David Silver Lecture4
- Sutton and Barto Chapter 5 and Chapter 6 (Focusing on prediction parts would be fine)

## Lecture 5: Model-Free Control

- Slides
- Reading Materials
- Introduction to Reinforcement Learning with David Silver Lecture5
- Sutton and Barto Chapter 5 and Chapter 6 (Focusing on control parts would be fine)

## Lecture 6: Neural Network and Backpropagation

- Slides
- Reading Materials
- Stanford University CS224N Lecture3 and Lecture4(2019 Edition)
- Maxout Networks
- Understanding the difficulty of training deep feedforward neural networks
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting
- Adam: A Method for Stochastic Optimization
- An overview of gradient descent optimization algorithms

## Lecture 7: Value Function Approximation

- Slides
- Reading Materials
- Introduction to Reinforcement Learning with David Silver Lecture6
- Sutton and Barto Chapter 6 (Focusing on q-learning parts would be fine)
- Human-level control through deep reinforcement learning

## Lecture 8: Policy Gradient

- Slides
- Reading Materials
- UC Berkeley CS 285 Deep Reinforcement Learning Lecture 5 and Lecture 9
- Reinforcement learning of motor skills with policy gradients
- Trust Region Policy Optimization
- Proximal Policy Optimization Algorithms
- Lagrange multipliers and constrained optimization
- Finding Taylor polynomial approximations of functions

## Lecture 9: Actor-Critic Algorithm

- Slides
- Reading Materials
- UC Berkeley CS 285 Deep Reinforcement Learning Lecture 6
- Asynchronous Methods for Deep Reinforcement Learning
- Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Deep Reinforcement Learning - Advantage Actor-Critic methods

## Lecture 10: Variational Inference and Generative Models

- Slides
- Reading Materials

## Textbooks

- Deep Learning
- Goodfellow, Bengio, and Courville, Deep Learning
- Zhang et al., Dive into Deep Learning

- Reinforcement Learning
- Sutton and Barto, Reinforcement Learning: An Introduction
- Vitay, Deep Reinforcement Learning

## Online Courses

- Reinforcement Learning