Introduction to Reinforcement Learning.

We would like to create a model that which when given a game state, it predicts the best move.

Lets say our game is the simple Tic Tac Toe. It is a small game and we can train the AI for it in a handful of minutes.

Here is our example neural network, reduced the number of hidden layer to avoid cluttering.

In the above network, the inputs are going to be board states. For example,

Lets assume the neural networks always predicts from the perspective of that the turn is of player -1.

If we can build a neural network, we can just flip the board and predict for the opposite player, easy peasy.

I use the algorithm mentioned at OPEN AI blog for training. Its almost same for the CartPole Reinforcement Learning environment from OpenAI GYM https://spinningup.openai.com/en/latest/algorithms/vpg.html.

Also please read into the blog http://karpathy.github.io/2016/05/31/rl/ from Andrej Karpathy from a beginner perspective.

So the training algorithm looks like this: 1. Run a number of simulations / battles / episodes 2. For every simulation — Run a play till the end of game, i.e. either someone wins, or the game ends in a draw. — Calculate the reward for the player. — Feed it into the Neural Network model for training.

If we run this enough times, the Network gets better at avoiding the bad moves and maximizing the probability of good moves. And voila, we have it right here, create a model that is better than the opponent.

Rev.	By	When	Δ	Comment
en11	bhikkhu	2023-01-01 11:05:12	2	Tiny change: 'ke this:\n1. Run a' -> 'ke this:\n\n1. Run a'
en10	bhikkhu	2023-01-01 11:03:39	3	Tiny change: 'rformance Atari gam' -> 'rformance in Atari gam'
en9	bhikkhu	2023-01-01 11:02:31	39	Tiny change: 'ation\n — Run a pla' -> 'ation\n --> Run a pla'
en8	bhikkhu	2023-01-01 11:00:07	7
en7	bhikkhu	2023-01-01 10:57:01	2	Tiny change: 'radical net way that ' -> 'radical new way that '
en6	bhikkhu	2023-01-01 10:55:46	73
en5	bhikkhu	2023-01-01 10:54:33	1786	Tiny change: 'bb94c.png)' -> 'bb94c.png)\n\n' (published)
en4	bhikkhu	2023-01-01 10:45:44	62
en3	bhikkhu	2023-01-01 10:41:12	133
en2	bhikkhu	2023-01-01 10:38:11	958
en1	bhikkhu	2023-01-01 10:29:11	880	Initial revision (saved to drafts)

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

#	User	Contrib.
1	Qingyu	157
2	adamant	153
3	Um_nik	147
3	Proof_by_QED	147
5	Dominater069	145
6	errorgorn	141
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

History