skloj's blog

By skloj, history, 4 years ago, In English

I was curious to know if anyone uses Suffix trees instead of Suffix arrays. I just settled down using Suffix arrays because I was not able to implement a fast Suffix tree, but not sure if people here had a different experience. For instance, is there a problem that can be solved only with a Suffix trees but not with Suffix arrays?

Thank you,

Full text and comments »

  • Vote: I like it
  • -1
  • Vote: I do not like it

By skloj, history, 4 years ago, In English

F# feels as high-level as Python but it is faster than Pypy thanks to static typing (even faster than Java due to the native support for Value Types). It provides additional data structures like BitArray/Bitset and BST in the standard library. F# is open source, runs on Linux and MacOS and the IDE support is what you expect from a Microsoft programming language.

I have been using the language on other sites (Kattis, Hackerrank, AtCoder, Exercism...) and I can say it is a wonderful tool for competitive programming.

I see .NET core is already supported in Codeforces for C#, so the infrastructure is already in place for F#. It would be great to be allowed to use the language in competitions, thanks!

Full text and comments »

  • Vote: I like it
  • +8
  • Vote: I do not like it

By skloj, history, 5 years ago, In English

After trying many implementations of Suffix Tree and Suffix Array in Python, the fastest I managed to get was based on adamant Ukkonen version: https://mirror.codeforces.com/blog/entry/16780

Here is my Python version enhanced with memoization (I tested it with UVA 10679 — I Love Strings!!), happy to receive feedback on how to improve the algorithm:

from sys import stdin, stdout, stderr, setrecursionlimit
from functools import lru_cache
setrecursionlimit(100000)

def read():
    return stdin.readline().rstrip()

def readint():
    return int(read())

def make_node(_pos, _len):
    global s, n, sz, to, link, fpos, slen, pos, node
    fpos[sz] = _pos
    slen[sz] = _len
    sz += 1
    return sz-1

def go_edge():
    global s, n, sz, to, link, fpos, slen, pos, node
    while (pos > slen[to[node].get(s[n - pos], 0)]):
        node = to[node].get(s[n - pos], 0)
        pos -= slen[node]

def add_letter(c):
    global s, n, sz, to, link, fpos, slen, pos, node
    s[n] = c
    n += 1
    pos += 1
    last = 0
    while(pos > 0):
        go_edge()
        edge = s[n - pos]
        v = to[node].get(edge, 0)
        t = s[fpos[v] + pos - 1]
        if (v == 0):
            to[node][edge] = make_node(n - pos, inf)
            link[last] = node
            last = 0
        elif (t == c):
            link[last] = node
            return
        else:
            u = make_node(fpos[v], pos - 1)
            to[u][c] = make_node(n - 1, inf)
            to[u][t] = v
            fpos[v] += pos - 1
            slen[v] -= pos - 1
            to[node][edge] = u
            link[last] = u
            last = u
        if(node == 0):
            pos -= 1
        else:
            node = link[node]

def init_tree(st):
    global slen, ans, inf, maxn, s, to, fpos, slen, link, node, pos, sz, n
    inf = int(1e9)
    maxn = len(st)*2+1 #int(1e6+1)
    s = [0]*maxn
    to = [{} for i in range(maxn)]
    fpos, slen, link = [0]*maxn, [0]*maxn, [0]*maxn
    node, pos = 0, 0
    sz = 1
    n = 0
    slen[0] = inf
    ans = 0
    for c in st:
        add_letter(ord(c))

def traverse_edge(st, idx, start, end):
    global len_text, len_st
    k = start
    while k <= end and k < len_text and idx < len_st:
        if text[k] != st[idx]:
            return -1
        k += 1
        idx += 1
    if idx == len_st:
        return idx
    return 0

def edgelen(v, init, e):
    if(v == 0):
        return 0
    return e-init+1

@lru_cache(maxsize=10000001)
def traverse(v, st, idx):
    global len_st
    r = -1
    init = fpos[v]
    end = fpos[v]+slen[v]
    e = end-1
    if v != 0:
        r = traverse_edge(st, idx, init, e)
        if r != 0:
            if r == -1:
                return []
            return [r]
    idx = idx + edgelen(v, init, e)
    if idx > len_st:
        return []
    k = ord(st[idx])
    children = to[v]
    if k in children:
        vv = children.get(k, 0)
        return traverse(vv, st, idx)
    return []

@lru_cache(maxsize=1001*10)
def solve(T, query):
    traverse.cache_clear()
    return "y\n" if traverse(0, query, 0) else "n\n"

def main():
    global text, len_st, len_text
    k = readint()
    for ki in range(k):
        text = read()+"$"
        len_text = len(text)
        init_tree(text)
        q = readint()
        for qi in range(q):
            query = read()
            len_st = len(query)
            stdout.write(solve(text, query))

main()

Full text and comments »

  • Vote: I like it
  • +5
  • Vote: I do not like it

By skloj, history, 5 years ago, In English

Does anyone know problems where online-construction of suffix structures is required? I am looking for some practice using the Ukkonen algorithm. Thanks!

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it

By skloj, history, 8 years ago, In English

I was just reading about a new algorithm to build a Suffix Tree: Simplified Weiner

The algorithm is very short and it makes a ton of sense to me (more than Ukkonen). But the algorithm has a drawback: "It is online from right to left but not from left to right. This is not an issue if the problem you solve is offline. Moreover, online problems often can be solved using reversed strings".

To me, the decision between Simplified Weiner vs Ukkonen is related to how often is the need of online-construction of suffix structures in contest. As far as I understand, Suffix Array doesn't support online building and every one seems to use it anyway.

¿What is your opinion about it? Thanks,

Full text and comments »

  • Vote: I like it
  • +17
  • Vote: I do not like it

By skloj, history, 9 years ago, In English

I am using the BinaryHeap offered by D language:

import std.stdio, std.container;
void main() {
  auto q = heapify(Array!int([44, 22, 100, -1, 0, 5, 6, 7]));
  while (!q.empty) {
    write(q.front, " ");
    q.removeFront;
  }
}

In my machine (or even on ideone) I got the elements sorted. But in codeforces I got this strange ordering:

44 100 22 7 6 5 0 -1

===== Used: 0 ms, 2856 KB

The codeforces version of D (v2.069.2) is even newer than ideone (v2.067.1), but it is older than my installation (v2.071.0).

Do you know why is the reason? Thank you very much.

Full text and comments »

  • Vote: I like it
  • +6
  • Vote: I do not like it