Изменения рейтингов за последние раунды временно удалены. Скоро они будут возвращены. ×

Блог пользователя unalive

Автор unalive, 4 часа назад, По-английски

Link to original blog here

Hello, in this blog I'll share a funny way to construct a suffix tree in $$$O(n \log^2{n})$$$ time, for a given string $$$S$$$ of length $$$n$$$. I am going to call the underlying idea the "Leader Split Trick". It can probably be used to solve other problems too.

Problem

A suffix tree of a string $$$S$$$ is a radix tree constructed from all the suffixes of $$$S$$$. It's easy to see that it has $$$O(n)$$$ nodes. It can be constructed in $$$O(n)$$$ using this.

I am going to share a trivial and practically useless way of building it in a worse time complexity, $$$O(n\log^2{n})$$$.

Algorithm

Notation

Initially, we start with an empty tree (with a virtual root node), and a set $$$G$$$ of all suffixes from $$$1$$$ to $$$n$$$, these suffixes will be stored in the form of their starting index.

It's easy to see that the paths from the root node to $$$l_u \forall (u \in G)$$$ will share some common prefix till an internal node $$$s_G$$$, after which these paths will split apart along some downward edges of the internal node. Let's define $$$d_G$$$ to be the longest common prefix across the paths $$$(\text{root}, l_u) \forall u \in G$$$.

Our algorithm will essentially do the following:

  1. Find $$$d_G$$$.
  2. Split apart $$$G$$$ into disjoint subsets $$$G'$$$ (each subset $$$G'$$$ will have suffixes whose leaves lie in the subtree of a unique child node of $$$s_G$$$).
  3. Solve the problem recursively for each subset, and add an edge in the suffix tree from $$$s_G$$$ to $$$s_{G'}$$$ for every $$$G'$$$.

Now, we define a recursive function $$$f(G, L, \text{dep}, \text{dis})$$$.

Definitions

In each call, $$$f(G, L, \text{dep}, \text{dis})$$$, we do the following:

  1. If the "Leader" element $$$L$$$ is undefined:
  2. Set $$$L$$$ to a random element of $$$G$$$.
  3. For every suffix $$$i \in G$$$, find $$$\text{dis[i]}$$$, the longest common prefix of the suffixes $$$i$$$ and $$$L$$$. This can be done in $$$O(\vert G \vert \cdot \log{n})$$$ using binary search + hashing. We store $$$\text{dis}$$$ in a sorted manner.
  4. Let $$$m$$$ be the minimum value in $$$\text{dis[]}$$$. It's easy to see that the internal node created from splitting $$$G$$$ will occur at depth $$$\text{dep} + m$$$. We create $$$s_G$$$, and add an edge corresponding to the substring $$$S[L + dep + 1, L + \text{dep} + m]$$$ from $$$s_{G_p}$$$ to $$$s_G$$$.
  5. Now, we delete all suffixes $$$i \in G : \text{dis[i]} = m$$$, from $$$G$$$ (and their corresponding elements from $$$\text{dis}$$$), and group them into disjoint subsets based on the character $$$S_{i + \text{dep} + m + 1}$$$ for suffix $$$i$$$ (basically the next character after the internal node).
  6. We call $$$f(G', \text{null}, \text{dep} + m, \text{null})$$$ for every newly created subset $$$G'$$$, and also call $$$f(G, L, \text{dep + m}, \text{dis})$$$ for the modified subset $$$G$$$.

Note: There might be some off-by-one errors.

Complexity Analysis

Consider the following problem:

We have $$$n$$$ singleton sets, and are given some set merge operations. When merging sets $$$A$$$ and $$$B$$$, we merge $$$B$$$ to $$$A$$$ with probability $$$\frac{\vert A \vert}{\vert A \vert + \vert B \vert}$$$ and $$$A$$$ to $$$B$$$ with the probability $$$\frac{\vert B \vert}{\vert A \vert + \vert B \vert}$$$.

The above problem is computationally equivalent to Randomized Quicksort, which has an expected time complexity of $$$O(n \log{n})$$$.

It's not difficult to see that our split operations are simply the operations that will occur in the above problem in a reversed manner (Formally, we can define a bijective relationship between the two sets of operations, such that related sets of operations will occur with the same probability) . Therefore, the time taken by all the split operations is $$$O(n \log{n})$$$.

However, every time we perform a split operation (merge in reverse), we also compute $$$\text{dis}$$$ for the child set $$$C$$$ (which gets merged into the parent set), and that takes $$$O(\vert C \vert \log{n})$$$ time. Thus, our entire algorithm has an expected time complexity of $$$O(n \log^2{n})$$$.

Generalization

I don't have enough math experience to write a fancy generalization for the kind of problems that can be solved with this, so I'll just describe the key requirements. It can be used to solve a problem with the following setup:

  1. We have a set $$$G$$$ of $$$n$$$ variables, and some function $$$f(x, j)$$$.
  2. We can find $$$\max{i} : f(x, j) = f(y, j) \quad \forall j \in [0, i]$$$ in $$$O(x)$$$ time.
  3. We want to create a suffix-tree like structure which will contain a (not necessarily unique) internal node corresponding to every pair $$$(x, y) \in G$$$ at depth $$$d_{x, y} = \max(i : f(x, i) = f(y, i))$$$.

The algorithm will construct the tree structure in $$$O(n \cdot \log(n) \cdot x)$$$.

  • Проголосовать: нравится
  • +29
  • Проголосовать: не нравится

»
3 часа назад, # |
Rev. 2   Проголосовать: нравится -30 Проголосовать: не нравится

No ones cares, suffix tree sucks, no one uses itnowadays. When was the last time you see a suffix tree problem in an official contests except for ICPC-style ones

»
3 часа назад, # |
  Проголосовать: нравится -6 Проголосовать: не нравится

candidate master in 6 contests is crazy tho

  • »
    »
    3 часа назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Are you stupid or something ? This is obviously his alternative account created for the sole purpose of farming rating

  • »
    »
    3 часа назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    are the ratings getting rolled back or what is happening i am new to CF can you please guide me hope to get a upvote

    • »
      »
      »
      8 минут назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      bcz of cheaters, when they get removed from the contests, the rating gets recalculated