Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Artificial Intelligence: Reinforcement Learning in Python
Welcome
Introduction (3:14)
Special Offer! Get the VIP version of this course (1:14)
Course Outline and Big Picture (8:53)
Where to get the Code (4:36)
How to succeed in this course (5:51)
Warmup (15:36)
Return of the Multi-Armed Bandit
Section Introduction: The Explore-Exploit Dilemma (10:17)
Applications of the Explore-Exploit Dilemma (8:00)
Epsilon-Greedy Theory (7:04)
Calculating a Sample Mean (pt 1) (5:56)
Epsilon-Greedy Beginner's Exercise Prompt (5:05)
Designing Your Bandit Program (4:09)
Epsilon-Greedy in Code (7:12)
Comparing Different Epsilons (6:02)
Optimistic Initial Values Theory (5:40)
Optimistic Initial Values Beginner's Exercise Prompt (2:26)
Optimistic Initial Values Code (4:18)
UCB1 Theory (14:32)
UCB1 Beginner's Exercise Prompt (2:14)
UCB1 Code (3:28)
Bayesian Bandits / Thompson Sampling Theory (pt 1) (12:43)
Bayesian Bandits / Thompson Sampling Theory (pt 2) (17:35)
Thompson Sampling Beginner's Exercise Prompt (2:50)
Thompson Sampling Code (5:03)
Thompson Sampling With Gaussian Reward Theory (11:24)
Thompson Sampling With Gaussian Reward Code (6:18)
Why don't we just use a library? (5:40)
Nonstationary Bandits (7:11)
Bandit Summary, Real Data, and Online Learning (6:29)
(Optional) Alternative Bandit Designs (10:05)
Suggestion Box (3:03)
Build an Intelligent Tic-Tac-Toe Agent
Naive Solution to Tic-Tac-Toe (3:50)
Components of a Reinforcement Learning System (8:00)
Notes on Assigning Rewards (2:41)
The Value Function and Your First Reinforcement Learning Algorithm (16:33)
Tic Tac Toe Code: Outline (3:16)
Tic Tac Toe Code: Representing States (2:56)
Tic Tac Toe Code: Enumerating States Recursively (6:14)
Tic Tac Toe Code: The Environment (6:36)
Tic Tac Toe Code: The Agent (5:48)
Tic Tac Toe Code: Main Loop and Demo (6:02)
Tic Tac Toe Summary (5:25)
Markov Decision Proccesses
Gridworld (2:13)
The Markov Property (4:36)
Defining and Formalizing the MDP (4:10)
Future Rewards (3:16)
Value Functions (4:38)
Optimal Policy and Optimal Value Function (4:09)
MDP Summary (1:35)
Dynamic Programming
Intro to Dynamic Programming and Iterative Policy Evaluation (3:06)
Gridworld in Code (5:47)
Iterative Policy Evaluation in Code (6:24)
Policy Improvement (2:51)
Policy Iteration (2:00)
Policy Iteration in Code (3:46)
Policy Iteration in Windy Gridworld (4:57)
Value Iteration (3:58)
Value Iteration in Code (2:14)
Dynamic Programming Summary (5:14)
Monte Carlo
Monte Carlo Intro (3:10)
Monte Carlo Policy Evaluation (5:45)
Monte Carlo Policy Evaluation in Code (3:35)
Policy Evaluation in Windy Gridworld (3:38)
Monte Carlo Control (5:59)
Monte Carlo Control in Code (4:04)
Monte Carlo Control without Exploring Starts (2:58)
Monte Carlo Control without Exploring Starts in Code (2:51)
Monte Carlo Summary (3:42)
Temporal Difference Learning
Temporal Difference Intro (1:42)
TD(0) Prediction (3:46)
TD(0) Prediction in Code (2:27)
SARSA (5:15)
SARSA in Code (3:38)
Q Learning (3:05)
Q Learning in Code (2:13)
TD Summary (2:34)
Approximation Methods
Approximation Intro (4:11)
Linear Models for Reinforcement Learning (4:16)
Features (4:02)
Monte Carlo Prediction with Approximation (1:54)
Monte Carlo Prediction with Approximation in Code (2:58)
TD(0) Semi-Gradient Prediction (4:22)
Semi-Gradient SARSA (3:08)
Semi-Gradient SARSA in Code (4:08)
Course Summary and Next Steps (8:38)
Appendix
What order should I take your courses in? (pt 1) (11:18)
What order should I take your courses in? (pt 2) (16:07)
How to Code by Yourself (part 1) (15:54)
How to Code by Yourself (part 2) (9:23)
How to Succeed in this Course (Long Version) (10:24)
BONUS: Where to get discount coupons and FREE deep learning material (5:31)
UCB1 Theory
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock