Minimization of DFA — different approach ideas?

№	Пользователь	Рейтинг
1	tourist	3993
2	jiangly	3743
3	orzdevinwang	3707
4	Radewoosh	3627
5	jqdai0815	3620
6	Benq	3564
7	Kevin114514	3443
8	ksun48	3434
9	Rewinding	3397
10	Um_nik	3396

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

I got this problem from a friend: return the minimum number of states needed to produce a DFA that recognizes a provided language L (link; http://poj.org/problem?id=3576). So I see online there are some algorithms for it like Hopcroft's, but I was trying to think about it a different way that I haven't gotten to work.

So I was thinking about it by starting with an automaton with a start and a sink node and (|w|-1)-long chains for each word w. And so in the first pass we repeatedly merge nodes with the same first character until we can't anymore. And then we repeat the same process from the suffix end of the words. I believe this should be correct because at every step we've done everything we can by a greedy argument. I've been trying to implement this for a while but it doesn't seem to work. This is what I've written so far:

t = int(input())

def remove_prefixes(group, completed):
    compressed = 0
    group.sort()
    i = 0
    while i < len(group):
        j = i - 1
        subgroup = []
        while j + 1 < len(group) and group[j + 1][0] == group[i][0]:
            if len(group[j + 1]) >= 2:
                subgroup.append(group[j + 1][1:])  # remove 1st char
            j += 1
        if j - i == 0:
            completed.append(group[i][1:])
        else:
            compressed += len(subgroup) - 1  # When merging n nodes, we decrease by n-1
            compressed += remove_prefixes(subgroup, completed)  # Continue recursively removing from this group 
        i = j + 1
    return compressed

for test in range(t):
    n = int(input())
    language = [input() for i in range(n)]
    nodes = 2 + sum(len(word) - 1 for word in language)
    completed = []
    nodes -= remove_prefixes(language, completed)
    # Remove the prefixes from the reversed strings
    remaining = [word[::-1] for word in completed if len(word) > 0]  # Exclude empty strings
    nodes -= remove_prefixes(remaining, [])
    print(nodes)

But a cleaner solution my friend showed me was the following:

    n = int(input())
    language = [input() for i in range(n)]
    suffix_sets = set()
    prefix_suffixes = {}
    for word in language:
        for i in range(0, len(word) + 1):
            prefix = word[:i]
            suffix = word[i:]
            if prefix in prefix_suffixes:
                prefix_suffixes[prefix].add(suffix)
            else:
                prefix_suffixes[prefix] = {suffix}
    suffix_sets = set(str(val) for val in prefix_suffixes.values())
    print(len(suffix_sets))

Which I see the idea for, but I'm wondering if my approach could be correct as well. Can anyone suggest their own approaches to the problem or explain if there's a flaw in my reasoning?

Блог пользователя johnkim3104