greedyaj's blog

By greedyaj, history, 3 months ago, In English

UPD : RESULTS ARE OUT!!!

Full text and comments »

  • Vote: I like it
  • -15
  • Vote: I do not like it

By greedyaj, history, 3 months ago, In English

Space Complexity of a Trie

Most blogs state that the space complexity of a trie is

$$$ O(N \cdot L \cdot R) $$$

where:

  • $$$N$$$ = number of strings
  • $$$L$$$ = maximum length of a string
  • $$$R$$$ = alphabet size (typically $$$26$$$ for lowercase letters)

This is usually justified by:

  • The trie can have at most $$$L$$$ levels
  • Each string contributes at most one node per level
  • Each node stores an array of size $$$R$$$

Hence:

$$$ \text{Space} \le N \cdot L \cdot R $$$

However, this is a very loose bound. Let us analyze this carefully.


Key Observation

At depth $$$d$$$, the number of distinct nodes is at most

$$$ \min(N, R^d) $$$

because:

  • Each node represents a unique prefix of length $$$d$$$
  • There are at most $$$R^d$$$ possible prefixes at depth $$$d$$$
  • And we cannot have more nodes than strings ($$$N$$$)

Example Calculation

Assume:

$$$ N = 10^6, \quad R = 26 $$$

Compute nodes per level:

$$$ \begin{aligned} \text{Level } 0 &: 26^0 = 1 \\ \text{Level } 1 &: 26^1 = 26 \\ \text{Level } 2 &: 26^2 = 676 \\ \text{Level } 3 &: 26^3 = 17576 \\ \text{Level } 4 &: 26^4 = 456976 \\ \text{Level } 5 &: 26^5 = 11881376 \gt N \end{aligned} $$$

From level 5 onward, the number of nodes is capped by $$$N$$$.


Total Number of Nodes

Let us count nodes:

  • Levels 0 to 4: $$$26^0 + 26^1 + 26^2 + 26^3 + 26^4$$$
  • Levels 5 to $$$L-1$$$: each can have at most $$$N$$$ nodes

Hence, total nodes:

$$$ 26^0 + 26^1 + 26^2 + 26^3 + 26^4 + (L - 4) \cdot N $$$

(We use $$$L-4$$$ because the first 4 levels are counted separately: 1,2,3,4 → 4 levels, so remaining $$$L-4$$$ levels are “full” levels capped by $$$N$$$.)


Total Space Usage

Each node stores an array of size $$$R = 26$$$, so total memory:

$$$ 26 \cdot \Big( 26^0 + 26^1 + 26^2 + 26^3 + 26^4 + (L-4) \cdot N \Big) $$$

Comparison With Naive Bound

Naive bound:

$$$ 26 \cdot N \cdot L $$$

We compare the prefix contribution:

$$$ 26^0 + 26^1 + 26^2 + 26^3 + 26^4 \lt 4 \cdot N $$$

Compute sum:

$$$ 26^0 + 26^1 + 26^2 + 26^3 + 26^4 = 1 + 26 + 676 + 17576 + 456976 = 475255 $$$

For $$$N = 10^6$$$:

$$$ 475255 \lt 4 \cdot 10^6 $$$

✅ True.

Hence, the naive bound overestimates memory by about 8×, because it assumes $$$N$$$ nodes at each level, while in reality early levels have far fewer nodes.


Tight Space Complexity

In general, the tight upper bound on the number of nodes is:

$$$ \sum_{d=0}^{L-1} \min(N, R^d) $$$

So total space:

$$$ O\Big(R \cdot \sum_{d=0}^{L-1} \min(N, R^d)\Big) $$$

For fixed alphabet size $$$R$$$:

$$$ \boxed{ \text{Space} = O(N \cdot L) \text{ (with a much smaller constant than } O(N \cdot L \cdot R)\text{)} } $$$

Takeaway

  • $$$O(N \cdot L \cdot R)$$$ is a very loose bound
  • Early levels of the trie have far fewer nodes than $$$N$$$
  • Real space usage is often ~8× smaller than the naive calculation
  • Correct intuition: $$$\min(N, R^d)$$$ nodes at depth $$$d$$$

Full text and comments »

  • Vote: I like it
  • -17
  • Vote: I do not like it