bhikkhu's blog

By bhikkhu, history, 17 months ago, In English

Heyy folks, I'd like to share my coding project with you. I always wanted to make a bicycle game of my own and here I am sharing the game I made from scratch(all the PHYSICS as well). It uses plain Javascript.

https://kingofdelphi.github.io/projects/bicycle/

This is the source code available for the public. https://github.com/kingofdelphi/bicycle

Please look up the source code and tweak as you feel like it.

Full text and comments »

  • Vote: I like it
  • +21
  • Vote: I do not like it

By bhikkhu, history, 2 years ago, In English

We would like to create a model that which when given a game state, it predicts the best move.

Lets say our game is the simple Tic Tac Toe. It is a small game and we can train the AI for it in a handful of minutes.

Here is our example neural network, reduced the number of hidden layer to avoid cluttering.

In the above network, the inputs are going to be board states. For example,

Lets assume the neural networks always predicts from the perspective of that the turn is of player -1.

If we can build a neural network, we can just flip the board and predict for the opposite player, easy peasy.

I use the algorithm mentioned at OPEN AI blog for training. Its almost same for the CartPole Reinforcement Learning environment from OpenAI GYM https://spinningup.openai.com/en/latest/algorithms/vpg.html.

Also please read into the blog http://karpathy.github.io/2016/05/31/rl/ from Andrej Karpathy from a beginner perspective.

So the training algorithm looks like this:

  1. Run a number of simulations / battles / episodes

For every simulation:

--> Run a play till the end of game, i.e. either someone wins, or the game ends in a draw. The learning agent picks a random action based on the probability outputted by neural network for the input game board state. This way the neural network has a stochastic behavior and it helps in both exploration vs exploitation of existing good moves. The higher the probability of a action, it means the action has been voted up many times, the lower means its not favored much but still the learning process can / needs to explore the action unless the probability is 0 thus adding some potential for the already existing network to not get stuck in some kind of minima where a little bit of exploration would have brought excellent returns.

--> Calculate the reward for the player.

--> Feed it into the Neural Network model for training.

If we run this enough times, the Network gets better at avoiding the bad moves and maximizing the probability of good moves. And voila, we have it right here, create a model that is better than the opponent.

Once we do it enough times, the model gets better and better. Although, it doesn't beat the MiniMax perfect playing opponent but at least I got something working using a radical new way that utilizes the latest Neural Network strategies. The algorithm I used is a rudimentary algorithm. Advanced form of algorithm has been used by OpenAI to achieve super human performance in Atari games and Dota 2 with nothing but self play.

Finally here is my code, https://github.com/kingofdelphi/tic_tac_toe_agi. Of course I am still improving the AI and my code as I learn since it hasn't been much since I started learning these stuffs.

I am going to try more advanced algorithms and see how the model improves. Right now, it loses against minmax 5% of the times, I want to get it down to zero.

It may be possible someone of you has already cracked the game using self play while utilizing neural networks. If so, I am very much interested to learn more about your approaches / code. Once again heres my code, https://github.com/kingofdelphi/tic_tac_toe_agi. I am gonna keep tweaking it to my hearts content.

Thank you for reading!

Full text and comments »

  • Vote: I like it
  • +5
  • Vote: I do not like it

By bhikkhu, history, 2 years ago, In English

Given the current updates in the Artificial Intelligence world, I was wondering if I got behind. I had already started out learning the AI stuffs many years ago. But there were always some doubts in my mind that AGI seems impossible. However, my older beliefs are vaporizing as I speak now.

Today I was able to finish up a game that the computer can learn entirely from self play (aka reinforcement learning). Just giving up the game rules and the program was able to beat any agent that can be coded up.

So initially I coded an agent that made random moves. My AI code was able to either win or draw, no losses.

So the AI learns to counter tackle whatever the move the adversary makes. Initially it loses some games, but after some games it becomes unbeatable. You have to see it to believe but this is just mind boggling. MY JAW is on the floor right now. This universe is freaking weird.

1. Code a dummy Player that makes random moves.

2. Make the AI learn to beat the dummy player. No special knowledge required for this part. I'll share soon.

3. Extract out the AI model

4. Make the new AI model play against the older version of itself.

5. Non stop improvement.

Given that I implemented it myself from scratch (of course with the help of some famous libraries but the bare bones are very little).

I am convinced AGI is inevitable.

I am going to share the details soon / source code.

OH MY FAWKING GOD, this is just unbelievable, Singularity is inevitable and I have a first hand PROOF now.

Full text and comments »

  • Vote: I like it
  • -1
  • Vote: I do not like it

By bhikkhu, history, 7 years ago, In English

http://mirror.codeforces.com/contest/982/problem/D

I don't get why we need to add +1 to a[i]. I understand that we need to add if n = 1. But why do we need to do so in other cases?

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it

By bhikkhu, 8 years ago, In English
#include <vector>
#include <utility>
#include <cassert>
int main() {
    std::vector<int> vec;
    vec.push_back(0);
    int & r = vec[0];
    if (r == 0) {
        r = 1;
        vec.push_back(0);
    }
    assert(r > 0);
    return 0;
}

Find out why this code is dangerous :D

I became a victim of this problem while solving a problem using logic similar to the code shown.

Full text and comments »

  • Vote: I like it
  • +33
  • Vote: I do not like it

By bhikkhu, history, 8 years ago, In English

While solving problem F of 394 div 2 I needed to solve the problem, fill rectangle (a,b) to (c,d) with value V. There are Q queries of this form.

I need to find for each cell, the total value of it i.e. F[i][j] where F is the final array after all the updates are performed.

I extended the simple 1d method. to update range [l, r] set F[l] = F[l] + 1, F[r + 1] = F[r + 1] — 1, and taking cumulative sum.

In 2d, I took two 2d arrays, start[i][j], finish[i][j]

whenever i see a rectangle (a,b) to (c,d), updates are

start[a][b]++

finish[a][d]++

start[c + 1][b]--

finish[c + 1][d]--

Think of this method as when a segment from column b to d starts i.e. at a and when the segment should be removed i.e. at c + 1

Now for each row, I take cumulative sum as in 1d which is take cumulative sum of start[i][j] and subtract finish[i][j] when leaving the cell, also push start[i][j] and finish[i][j] down

This method works, but I have seen other methods while using just one 2d array.

Can anyone explain the method?

Full text and comments »

  • Vote: I like it
  • +6
  • Vote: I do not like it

By bhikkhu, history, 8 years ago, In English

Hello guys!!!,

I have been observing that without time constraint, I often feel lazy. So, I think I should first list few problems to solve and fix a time limit to solve.

So instead of using different timer, I'd like to use codeforces itself (if the feature is available).

I need the following feature.

  1. Create a list of problems(any two problems can be from different contests)
  2. Set time limit
  3. Start contest

This is similar to virtual contest except that I'd be the only participant of it. Also, if the codeforces judge stalls, it should automatically pause the timer.

Is this feature available?

Full text and comments »

  • Vote: I like it
  • +7
  • Vote: I do not like it

By bhikkhu, history, 8 years ago, In English

http://mirror.codeforces.com/contest/758/problem/F

Guys can anybody explain why the upper bounds for x and y for the ratio d = x/y is n-1 th root of 'r'?

I know it myself but I want to know what other coders think about it.

So basically explain why x <= power(r, 1 / (n — 1)) and y <= power(r, 1 / (n — 1))?

Full text and comments »

  • Vote: I like it
  • -6
  • Vote: I do not like it

By bhikkhu, history, 8 years ago, In English

http://mirror.codeforces.com/contest/439/problem/D

Ternary search can only be used on strictly increasing or decreasing sequence.

But, I think the problem is special in the sense that the search space is strictly decreasing first then some equal values and then strictly increasing.

Thus, ternary search can be used here. But I need to prove that if two f(p) == f(p + 1), then the value must be the answer (if f(p) == f(p + 1), f(p) = our_answer to the problem, in case f(p) = f(p + 1) for any p).

Full text and comments »

  • Vote: I like it
  • +5
  • Vote: I do not like it

By bhikkhu, history, 8 years ago, In English
  • Vote: I like it
  • 0
  • Vote: I do not like it

By bhikkhu, history, 9 years ago, In English

Guys, please answer this, it won't take much time for those who have already solved this.

http://mirror.codeforces.com/contest/552/problem/C

In this problem, if X ≤ wk for where k is the largest possible, then we don't need to use all coins that have higher power than k + 1, i.e. coins wk + 2, wk + 3, ...wn will not be used.

To start with the proof, I will take wk + 2 first.

I need to prove that wk + 2 is not used in all of the valid solutions which means, wk + 2 doesn't occur either on the left or right side of the equation shown at the top. Now, if I could somehow prove that using wk + 2 in left side or right side, I cannot arrive at a solution I would be complete with my proof. I will first put wk + 2 in the left (along with X) and see. I have X + wk + 2 on the left, I also know that X ≤ wk. I can't work any further.

I am not able to prove how.

Ok, guys I got it!!!! (someone helped me). Here's a more easy proof.

Lets think a different way. What are the coin changes that I can make using wk + 2. What is the smallest such change? The smallest change is equal to when we put wk + 2 on the left and all smaller weights on the right i.e.

wk + 2 - wk + 1 - wk - .... - w0

wk + 2 — (wk + 2 - 1) / (w - 1)

But this amount is greater than wk.

Thus, if we use wk + 2, the change must be greater than wk. But, X ≤ wk. So, we can't make any such X. Proof is complete.

Full text and comments »

  • Vote: I like it
  • +3
  • Vote: I do not like it

By bhikkhu, history, 10 years ago, In English

We have two groups of nodes, group A consisting of N items, group B consisting of M items. There is an edge connecting node a of group A and node b of group B, where

a = i mod N

b = i mod M

where i is an integer.

Image description of the problem (http://mirror.codeforces.com/predownloaded/5a/b5/5ab5c142069cd987264c765383e682f622a7f0bf.png)

We need to find for each node if there is a path connecting to another node in the graph.

To create the adjacency matrix, I iterated from i from 0 to LCM(N,M) — 1, since after that, we would get same edge pairs. Now, discarding this approach, how can one find mathematically, if two nodes of the graph are connected by a path or not? This is actually my trimmed version of this problem (http://mirror.codeforces.com/problemset/problem/515/B)

The tutorial doesnot explain the GCD method to get O(N + M) complexity. I tried myself but couldnot go far. Any help regarding how it was derived will be immensely helpful.

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it

By bhikkhu, history, 10 years ago, In English

http://mirror.codeforces.com/problemset/problem/518/B

I wrote two naive solutions that should have both resulted in TLE. But one is showing wrong answer.

The only difference in both the solutions is that in Sol A, I mark the character to be removed as '0' whereas in the next one I actually remove the character from the string. But the solution in which I remove

the character is showing wrong answer instead of TLE. Could andybody explain what caused the judge to produce different decisions?

Solution A : TLE http://mirror.codeforces.com/contest/518/submission/11545724

Solution B: Wrong Answer http://mirror.codeforces.com/contest/518/submission/11545764

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it

By bhikkhu, history, 10 years ago, In English

http://mirror.codeforces.com/problemset/problem/482/A

if k = n — 1, all of the differences must be different and one of them is d = n — 1, which means 1 and n must come together. till now we have [1 n]

to get d = n — 2, either 1 and n — 1 would have to come together, or n and 2

so, [n — 1, 1, n] or [1, n, 2] are correct.

I know that adding each element greedily to [1, n] at the back i.e. [1, n, 2, ].... does work[or to the front]

[1, n, 2, n — 1, 3, n — 2]

But, how does one think through it during the contest? Is it grabbing pen and paper and trying brute force like

approach to find solution? That would take a hell lot of time (at least for me). But this was an specific case, [k =

n — 1] and using this case, general solution is obtained. So, is looking for some special cases a problem solving

strategy?

I cannot infer how people reach such a simple and sweet solution for such a seemingly complex problem.

Any suggestions would be helpful.

Full text and comments »

  • Vote: I like it
  • -1
  • Vote: I do not like it

By bhikkhu, 10 years ago, In English

(Actually this is a question) So I thought I knew the intuition behind the Manber and Myers algorithm. Here is what I understood.

Suppose the string is "banana"

We first partition the suffixes in terms of similar first character as

a, anana, ana => bucket 1

banana => bucket 2

na, nana => bucket 3

Then to get the partition by the next 2h characters, my algo is:

  1. scan each bucket one by one

  2. take the first bucket

  3. for each suffix in this bucket, find the position of sa + 2h, if we go out of bounds assign position = 0

So picture looks like this:

a = 0, anana = 3, ana = 3 (since a + 1 > n, nana is in 3rd bucket and na is also in third bucket)

  1. Now, sort the assigned indices of the bucket using counting sort.

  2. Scan the new indices one by one and create new partitions, here we get

[a], [anana, ana]

  1. Do this until buckets = n

My problem is in 4th part, where I use counting sort.

First I coded as I had thought that I had understood the algorithm. But then I ran into trouble. As the number of buckets goes on increasing during each iteration, my algorithm approaches O(n^2) (as I assign ranks during counting sort according to the location of s + 2h suffix). So with some modification to the algorithm can I get O(nlogn)? If not what should I do?

Ok. I removed the code. So please answer me now.

Full text and comments »

  • Vote: I like it
  • -14
  • Vote: I do not like it

By bhikkhu, 10 years ago, In English

I thought for many days about how do i solve this problem: http://mirror.codeforces.com/contest/475/problem/D But it seems to me that I cannot get any better than O(N^2). I think there is a knowledge gap(mathematics). I looked through the editorial too. But I cannot understand. I want to understand the solution. I even looked many of the submissions(using sparse tables, segment trees...). But staring the code doesn't seem to be beneficial(or may be I am too dumb for the problem). So, please help me solve this problem. I just want the idea. Thank you coders!

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it