Записи в блоге

№	Пользователь	Рейтинг
1	tourist	3993
2	jiangly	3743
3	orzdevinwang	3707
4	Radewoosh	3627
5	jqdai0815	3620
6	Benq	3564
7	Kevin114514	3443
8	ksun48	3434
9	Rewinding	3397
10	Um_nik	3396

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	156
8	TheScrasse	154
9	Dominater069	153
10	nor	152

Paul Masurel offers a great article about the Levenstein automata : link

He offers an algorithm to compute if the levenstein distance between two strings is less than or equal to D.

The naive algorithm is fairly straightforward :

def levenshtein(s1, s2, D=2):
    """
    Returns True iff the two string
    s1 and s2 is lesser or equal to D
    """
    if D == -1:
        return False
    if len(s1) < len(s2):
        return levenshtein(s2, s1)
    if len(s2) == 0:
        return len(s1) <= D
    return (levenshtein(s1[1:], s2[1:], D-1)   # substitution\
        or levenshtein(s1, s2[1:], D-1)       # insertion\
        or levenshtein(s1[1:], s2, D-1)       # deletion\
        or (
            # character match
            (s1[0] == s2[0]) and \
            levenshtein(s1[1:], s2[1:], D)
        ))

This basically tries all possibilities which is with exponential complexity if we don't save any of the calls.

He offers a better algorithm but for the last week I have been trying to understand how it works, here it is :


def levenshtein(s1, s2, D=2):
    """
    Returns True iff the edit distance between
    the two strings s1 and s2 is lesser or
    equal to D
    """
    if len(s1) == 0:
        return len(s2) <= D
    if len(s2) == 0:
        return len(s1) <= D
    # assuming s1[0] is NOT used to build s2,
    if D > 0:
        if levenshtein(s1[1:], s2, D - 1):
            # deletion
            return True
        if levenshtein(s1[1:], s2[1:], D - 1):
            # substitution
            return True
    # assuming s1[0] is used to build s2
    for d in range(min(D+1, len(s2))):
        # d is the position where s1[0]
        # might be used.
        # it is also the number of character
        # that are required to be inserted before
        # using s1[d].
        if s1[0] == s2[d]:
            if levenshtein(s1[1:], s2[d+1:], D - d):
                return True
    return False

It would be great if anyone can shed some light on how this works! Thanks!

Полный текст и комментарии »

Блог пользователя intptr