The Blackjack game we set up in Part 1 does not accurately model the Reinforcement Learning cycle. Why Re-Build Our Blackjack Environment Using OpenAI Gym? There are no starting and stopping points in the stock market, and you will have to get creative when defining episodes!įor these reasons, predicting the stock market using Reinforcement Learning would be considered a continuous task, and cracking Blackjack would be considered an episodic task. However, other contexts, such as applying Reinforcement Learning to predict the stock market, do not have “rounds” to help define episodes. Luckily, we had the notion of a “round” in Blackjack to help define an episode. This means that there will usually be 1–3 State/Action/Reward tuples per episode, because the agent will likely only make 1–3 stand/hit decisions per round of Blackjack (or more on rare occasions). How many loops around the cycle should comprise an episode? In my Blackjack environment, I considered one round of Blackjack to be one episode. In the next article, we will dive into exactly how our Reinforcement Learning algorithm will direct our agent in using these State/Action/Reward tuples to optimize its policy.
DLOW SHUFFLE PART 1 AND 2 TOGETHER UPDATE
Going through some “n” number of loops around the cycle and recording State/Action/Reward tuples as we go along is called an episode.Īfter we do our desired number of loops (let’s say 50), our agent will go through the State/Action/Reward tuples and update its policy accordingly. The cycle above implies that this loop will go on indefinitely, so where/when does the actual learning happen?Ī single cycle can be represented as a sequence:Īs we do loops around this cycle, we can record these State/Action/Reward tuples.
![dlow shuffle part 1 and 2 together dlow shuffle part 1 and 2 together](http://thumbnails-visually.netdna-ssl.com/the-7-hottest-hip-hop-songs-in-the-7-biggest-cities_52fa89d68805a_w450_h600.jpg)
The environment models this by sending the agent an initial state (player hand value + dealer up-card value).
![dlow shuffle part 1 and 2 together dlow shuffle part 1 and 2 together](https://i.ytimg.com/vi/3XmqIXWmvCE/maxresdefault.jpg)
In our version of Blackjack, the available actions are hit and stand.
![dlow shuffle part 1 and 2 together dlow shuffle part 1 and 2 together](https://i.ytimg.com/vi/7Rmak2FUg6o/maxresdefault.jpg)
In our version of Blackjack, a state will consist of the player’s hand value and the dealer’s up-card value.