adamant's blog

By adamant, history, 9 years ago, translation, In English

Hi everyone!

This summer in Moscow IPT was held Moscow International Workshop ACM ICPC. I gave a lecture on suffix structures there (it was only about suffix automaton and suffix tree actually). In this regard, I would like to present to you the lecture notes. So here are Russian and English versions. Enjoy and Happy Holidays :)

P. S. My very apologies for my bad English :(

UPD: Thanks to Burunduk1 for help in code formatting.

UPD2: Please also refer to Wikipedia article on this topic. I'm the main author of its current (29 Oct 2021) version.

Full text and comments »

  • Vote: I like it
  • +209
  • Vote: I do not like it

By adamant, 9 years ago, translation, In English

Hi everyone!

Today I want to talk about one quite famous and interesting problem.

So, here it is: given string S. Split it in the minimum possible amount of palindromic strings. Rather unpretentious, isn't it? You can find this problem here or here. But whenever you see it, in the best case intended solution would be a quadratic (or even cubic). Here will be described a solution to this problem in online (ie, answer will be received for each prefix).

Full text and comments »

  • Vote: I like it
  • +62
  • Vote: I do not like it

By adamant, 10 years ago, translation, In English

First part, I guess. Even if you think that you are familiar with suffix tree, please, take a look at the code below. It may be interesting to you.

Hi everyone! Finally I learnt this one :)

In this entry I would like to avoid long and complex theory which scared me from suffix tree for a long time. So straight to the point. I will not prove algorithm if you want some proofs, you may check stackoverflow or Dan Gusfield's book... Or somewhere else, I don't know.

Full text and comments »

  • Vote: I like it
  • +104
  • Vote: I do not like it

By adamant, 10 years ago, translation, In English

Hi everyone!

Recently, at the MIPT: The Fall training camp on the contest from Alexander Milanin was a problem from Petr Mitrichev Contest 7. We were given a graph and a set of queries like "suppose we removed from the graph k ≤ 4 edges. Check whether graph is still connected?" I want to talk further about the solution of a more general problem, when the edges are added and removed without additional constraints in offline. The first algorithm with such an assessment was offered by David Eppstein in 1992, reducing it to fully dynamic minimum spanning tree problem, but here we will focus on a simple algorithm, proposed in 2012 by Sergei Burunduk1 Kopeliovich.

Full text and comments »

  • Vote: I like it
  • +141
  • Vote: I do not like it

By adamant, 10 years ago, translation, In English

Hi everyone!

This time I would like to write about the Aho-Corasick algorithm. This structure is very well documented and many of you may already know it. However, I still would try to describe some of the applications that are not so well known.

Full text and comments »

  • Vote: I like it
  • +100
  • Vote: I do not like it

By adamant, 10 years ago, translation, In English

Hi everyone!

As some of you may know, on this summer camp in Petrozavodsk droptable presented a new data structure, palindromic tree. I had the honor to participate in the study of the structure for the six months before that, and I want to tell about it now :)

Full text and comments »

  • Vote: I like it
  • +160
  • Vote: I do not like it

By adamant, 10 years ago, translation, In English

Hi everyone!

Today summer Petrozavodsk summer training camp 2014 has been started. I hope that I don't violate any unofficial taboo by writing about it :)

If you, just like me, are in a group of people who do not have the opportunity to participate in this celebration of life, you can follow the camp, for example, at SnarkNews or acm.math.spbu.ru.

Full text and comments »

  • Vote: I like it
  • +52
  • Vote: I do not like it

By adamant, 10 years ago, translation, In English

Hi everyone!

Some of you may remember my entry about policy based data structures. In addition to a review article on these structures, I also wanted to write about the possibility of using your own Node_Update class. Then I hadn't enough time for it, but now I can and I want to catch up and share with you some new knowledge.

Full text and comments »

  • Vote: I like it
  • +79
  • Vote: I do not like it

By adamant, 10 years ago, translation, In English

Hi everyone!

I recently read about such an interesting data structure as skip-list. It seemed to me, the structure is very interesting and at the same time easy to use. That is the reason I decided to experiment with the structure in various problems and try to implement the various modifications of it.

Full text and comments »

  • Vote: I like it
  • +85
  • Vote: I do not like it

By adamant, 11 years ago, translation, In English

Hi everyone!

It's no secret that in competitive programming we often have to work with graphs[citation needed]. We also often have to draw graphs. And maybe some people already know that there is tool for visualization of graphs called GraphViz, which parses the DOT code into visual image with the corresponding graph.

Example of a graph that can be obtained using graphviz — compressed suffix tree for the string abaabbaaa. Yes, the graph contains a small mistake, but it is not important :)

Now, actually, to the subject. How do you feel about adding DOT language support to the Codeforces editor? I think it would be cool, and you? :)

Full text and comments »

  • Vote: I like it
  • +69
  • Vote: I do not like it

By adamant, 11 years ago, translation, In English

Hi everyone! 2 years ago Logvinov_Leon asked about way to get the suffix array from suffix automata. Oddly enough, he had not received answer. However, as you may have noticed, I'm interested in the topic of transformations of some string structures to others, so I decided to shed some light on how this is done.

Full text and comments »

  • Vote: I like it
  • +68
  • Vote: I do not like it

By adamant, 11 years ago, translation, In English

Hi everyone again! Recently, solving the problem 1937 from Timus (by the way, I recommend it to you too! It is a great opportunity to improve yourself in string algorithms), I was faced with the necessity of finding all subpalindromes in the string. Experienced programmers already know that one of the best algorithms for this is Manacher's algo that allows you to get all subpalindromes in compressed form without any additional structures. The only problem — the algorithm is kinda hard to implement. That is, its idea is simple and straightforward, but the implementation usually are heaps of code to the widespread consideration of where to write +1, and where -1.

Full text and comments »

  • Vote: I like it
  • +48
  • Vote: I do not like it

By adamant, 11 years ago, translation, In English

Hi everyone! The countdown is on. Round 1B begins in under 24 hours. If you preferred to sleep instead of competing in round 1A (just like me!), then this round is for you! The top 1000 contestants from sub-round will advance to Online Round 2 and won't be able to compete in further sub-rounds.

I remind you that round will take place on Saturday, May 3 at 16:00 UTC at the site, that everybody knows.

Full text and comments »

  • Vote: I like it
  • +35
  • Vote: I do not like it

By adamant, 11 years ago, translation, In English

Hi everyone! I was always wondered how cleverly interwoven so-called string algorithms. Six months ago I wrote here article about the possibility of a rapid transition from Z-function to prefix-function and backward. Some experienced users already know that such transitions are possible between the more complex string structures — suffix tree and suffix automaton. This transition is described at e -maxx.ru. Now I would like to tell you about such data structure as a suffix tree, and share the simple enough (theoretically) way of its fast building — obtain a suffix tree from suffix array .

Full text and comments »

  • Vote: I like it
  • +82
  • Vote: I do not like it

By adamant, 11 years ago, translation, In English

Hi everyone! I think many of you know about such data type as set. It must support such operations like:

  • Insert an element into set;
  • Check if the set contain some element;
  • Delete element from set.

If we talk about ordered set, then elements (what a surprise!) must be kept in some order. As I think, there is also very important next operations:

  • Find the k-th element in a set;
  • Find the order of element in a set.

Most of all, for the implementation of such functional some may use binary balanced trees (AVL- tree , red-black tree , cartesian tree , etc). However, in this article I would like to talk about some of the features in the implementation of an ordered set with other structures. In particular, I'll show the implementation with the segment tree and with the Binary Indexed Tree (Fenwick tree). But I just want to note that it only allows you to build the set of natural numbers. For all other types of elements such methods will not work :(

The basic idea:

Let the index of cell in the array be the value of element in the set and let the value in the cell be the amount of element in cell. Then we can easily do all required operations in the logarithmic time. Really, the insertion of element x in the set is the same as increment of arr[x]. Deleting of element is the same as decrement of arr[x]. To check if the set contains element we should check if arr[x]>0, i.e. sum(x, x) > 0. We should also mention the operation of finding K-th element. To do this we will have to use the technique of binary expansion. In the segment tree we should store the sizes of the subtrees, like in BST, and in binary indexed tree we can watch to the following image showing the distribution of elements in tree and derive an algorithm:


int get(int x) { int sum=0; int ret=0; // Number of element in array, for which the sum is equal to sum for(int i=1<<MPOW;i && ret+(i-1)<N;i>>=1) // Loop through the powers of two, starting with the highest possible { if(sum+arr[ret+(i-1)]<x) // Trying to expand the current prefix sum+=arr[ret+(i-1)], ret+=i; } return ret; }

Easy to see that while using this approach ret always has a value such that the next time a number from the interval [0; ret] will be met not earlier than at ret + i, however, due to decreasing of i, we will never reach this point, and therefore, none of the elements in the prefix will be counted twice. This gives us the right to believe that the algorithm is correct. And the check on some problems confirms it :)

The basic idea is presented, for now let's talk about the advantages and disadvantages of this approach.

Pros:

  • We can write it really fast;
  • Intuintive (especially in the case of the segment tree);  
  • Works fast (in practice often overtakes BST);

Contras:  

  • Limited functionality (supports only integers in the set);  
  • In its simplest form takes O(C) memory, where C — the maximum possible number in the set;

The second drawback is VERY essential. Fortunately, it can be eliminated. You can do this in two ways.

  1. Compression of coordinates. If we can solve the problem in an offline, we read all queries and learn all possible numbers, which we will have to handle. Then we sort them, and associate smallest element with one, the second smallest — with two, etc. This reduces memory complexity to O(M), where M is the number of queries. However, unfortunately, it does not always work.
  2. Dynamic growth of the tree. The method is not to store int the tree the vertices, which we do not need. There are at least three ways to do this. Unfortunately, for binary indexed tree there is only one of them is suitable. The first option — to use a hash map. Bad, very bad option. More precisely, with the asymptotic point of view it is good, but the hash map have very large constant, so I do not recommend this method. Unfortunately, it is that only way that we can use in BIT :) . The second option — extension with pointers. When we need a new tree vertex, we simply create it. Cheap and cheerful. However , it is still not the fastest option. The third option — statically allocate a block of MlogC vertices, and then for each vertex store the indexes of its children (as in trie). In practice, it is the fastest way. Memory complexity is the same in all three methods and is reduced to O(MlogC), which are generally acceptable.

I hope that I was interesting/useful to someone. Good luck and have fun! :)

P.S. Sorry for my poor English. Again ;)

Full text and comments »

  • Vote: I like it
  • +73
  • Vote: I do not like it

By adamant, 11 years ago, translation, In English

Hi everyone! Many saw entry of user Zlobober which stated that the Thue-Morse string with terrifying force knocks hash modulo 2^64 . Single hashes were too unsafe because of the birthday paradox. The only remaining salvation from infernal suffix structures was double hashing. However, soon it will come to an end. Thoroughly studying Thue-Morse string and its possible modifications, I came to the conclusion that the test that breaks even double hash solutions is possible.

Here I described in detail the construction of the test, its use against real solutions from various contests and proof of its work.

Now I propose to users of codeforces to discuss what to do in this situation. Is there really no more ways to solve the problems with polynomial hash? Who has any thoughts on this?

Full text and comments »

  • Vote: I like it
  • -49
  • Vote: I do not like it

By adamant, 11 years ago, translation, In English

Hi everyone! After a relatively long lull, I decided that my contribution growing too slowly the hour has come to please you with another article in the blog :)

2 months ago user Perlik wrote an article, in which he described a very interesting STL implemented data structure that allows you to quickly perform various operations with substrings. Some time after I tested it on various tasks and, unfortunately, tend to get a negative result — rope was too slow, especially when it came to working with individual elements.

For some time, I forgot about that article. Increasingly, however, I was faced with problems in which it was necessary to implement set with the ability to know ordinal number of item and also to get item by its ordinal number (ie, order statistic in the set). And then I remembered that in the comments to that article, someone mentioned about the mysterious data structure order statistics tree, which supports these two operations and which is implemented in STL (unfortunately only for the GNU C++). And here begins my fascinating acquaintance with policy based data structures, and I want to tell you about them :)

Full text and comments »

  • Vote: I like it
  • +120
  • Vote: I do not like it