Hello everyone! In this blog I want to share a simple but powerful technique: counting subsequences of a string that belong to a given regular language, by adding a DFA state dimension to the DP.
The idea itself is not new, but I haven't seen it written down explicitly as a general technique. Once you see it, you'll start recognizing it in many problems.
Maybe this is well known in China, but I'm happy because I discovered it independently.
Prerequisites
You should be comfortable with:
Dynamic programming.
Basic concepts from the theory of formal languages: alphabets, regular languages, and deterministic finite automata (DFA).
A fun motivating problem
Let's start with something concrete. You have a string $$$s$$$ of length $$$n$$$ over the alphabet $$$\Sigma = {\texttt{a}, \texttt{b}, \ldots, \texttt{z}, \texttt{@}, \texttt{.}}$$$. You want to count how many subsequences of $$$s$$$ form a valid email address.
We define a "valid email" as any string in the language:
For example, a@b.com is valid. How would you solve this?
Step 1: build the DFA
The language $$$L$$$ is regular, so we can build a DFA $$$\mathcal{A} = (Q, \Sigma, \delta, q_0, F)$$$ that recognizes it. Here's the diagram 
Step 2: the DP
Now comes the key idea. We define:
The base case is $$$dp[0][q_0] = 1$$$ (the empty subsequence sits at the start state) and $$$dp[0][q] = 0$$$ for all other states.
When we go from row $$$i$$$ to row $$$i+1$$$, we consider character $$$s[i]$$$ and for each state $$$q$$$:
Skip $$$s[i]$$$: the subsequence doesn't change, so $$$dp[i+1][q]$$$ gets $$$dp[i][q]$$$.
Take $$$s[i]$$$: the subsequence extends by one character, and $$$\mathcal{A}$$$ transitions from $$$q$$$ to $$$\delta(q, s[i])$$$, so $$$dp[i+1][\delta(q, s[i])]$$$ gets $$$dp[i][q]$$$.
More precisely:
The answer is $$$\sum_{q \in F} dp[n][q]$$$.
// dp[i][q] = number of subsequences of s[0..i-1] that leave the DFA in state q
vector<vector<long long>> dp(n + 1, vector<long long>(8, 0));
dp[0][0] = 1;
for (int i = 0; i < n; i++) {
for (int q = 0; q < 8; q++) {
// skip s[i]: carry over
dp[i + 1][q] = (dp[i + 1][q] + dp[i][q]) % MOD;
// take s[i]: transition
int nq = delta[q][s[i]];
if (nq != DEAD) {
dp[i + 1][nq] = (dp[i + 1][nq] + dp[i][q]) % MOD;
}
}
}
cout << dp[n][7] << "\n"; // accepting state
Done. $$$O(n \cdot 8)$$$ time, and the table $$$dp[i][q]$$$ makes the structure completely explicit: row $$$i$$$ summarizes everything about the first $$$i$$$ characters.
The general technique
There's nothing special about emails. The same idea works for any regular language $$$L$$$ recognized by a DFA $$$\mathcal{A} = (Q, \Sigma, \delta, q_0, F)$$$.
We define:
Base case: $$$dp[0][q_0] = 1$$$.
Transition: for each position $$$i$$$ and each state $$$q$$$, we can skip $$$s[i]$$$ (carry $$$dp[i][q]$$$ to $$$dp[i+1][q]$$$) or take $$$s[i]$$$ (add $$$dp[i][q]$$$ to $$$dp[i+1][\delta(q, s[i])]$$$).
Answer: $$$\sum_{q \in F} dp[n][q]$$$.
Complexity: $$$O(n \cdot |Q|)$$$ time, $$$O(n \cdot |Q|)$$$ space (or $$$O(|Q|)$$$ if you only keep two consecutive rows).
That's the whole technique. The rest of this blog is about recognizing where to apply it.
Why does this work?
Each subsequence of $$$s$$$ is itself a string over $$$\Sigma$$$. We want to count how many of those strings belong to $$$L$$$. The DP simulates $$$\mathcal{A}$$$ on all $$$2^n$$$ possible subsequences simultaneously, grouping them by their current DFA state. This is the product construction: the intersection of the "language of all subsequences of $$$s$$$" with $$$L$$$, done implicitly through the DP.
Composing constraints
What if your problem has two constraints on the subsequence?
If both are regular: build the product automaton (states are pairs), or equivalently, add both state dimensions to the DP.
If one is regular and one is not (e.g., context-free): track the regular part with a DFA state, and the non-regular part with whatever other DP dimension it needs.
The dimensions just multiply. This is where the technique really shines.
Application: counting C comments
Here's another example. Given a string $$$s$$$ of length $$$n$$$ over the alphabet $$$\Sigma = {\texttt{/}, \texttt{*}, \texttt{a}\text{-}\texttt{z}}$$$, count the number of subsequences that form a valid C-style comment, that is, strings in the language:
For example, /* hello */ is valid. The subsequence must start with /*, end with */, and can have anything in between.
The DFA has 5 states. Here is the diagram: 
We plug this directly into our technique:
And the answer is $$$dp[n][4]$$$. Five states, one pass, $$$O(5n)$$$. Done.
Now, what if we wanted to count subsequences that are both a balanced bracket sequence and contain a C comment? Suppose $$$\Sigma = {\texttt{(}, \texttt{)}, \texttt{/}, \texttt{*}}$$$. We just compose: track balance $$$b$$$ for the bracket structure and DFA state $$$q$$$ for the comment pattern:
The answer is $$$dp[n][0][4]$$$. That's it: composing constraints costs nothing beyond multiplying dimensions.
Application: Sub-RBS (hard version)
Let's see this in action on a harder problem. Consider B2. Sub-RBS (Hard Version):
Given a bracket sequence $$$s$$$ of length $$$n \leq 100$$$. For each subsequence $$$t$$$ of $$$s$$$, define its score as the length of the longest regular bracket subsequence of $$$t$$$ that is "better" than $$$t$$$ (under a custom ordering where
($$$ \lt $$$)), or $$$0$$$ if no such subsequence exists or $$$t$$$ is not a regular bracket sequence. Find the sum of scores over all non-empty subsequences, modulo $$$998244353$$$.
The key insight: a regular bracket sequence $$$t$$$ admits a "better" regular subsequence if and only if $$$t$$$ contains the subsequence $$$\texttt{)((}$$$, a ) followed (not necessarily immediately) by two (s. In formal terms, $$$t$$$ belongs to the language:
where $$$\Sigma = {\texttt{(}, \texttt{)}}$$$. This is a regular language! A 5-state DFA captures it. Here's the diagram: 
Now we need subsequences that satisfy two conditions simultaneously:
- Regular bracket sequence: this is context-free (not regular), but we can track it with a "balance" counter $$$b$$$ bounded by $$$n$$$.
- Contains the $$$\texttt{)((}$$$ pattern: regular, tracked by our 5-state DFA.
Our DP becomes:
where $$$dp[i][b][q]$$$ counts the number of subsequences of $$$s[0..i-1]$$$ with bracket balance $$$b$$$ and DFA state $$$q$$$. We process each character choosing to take or skip, updating both dimensions simultaneously. At the end, we look at $$$dp[n][0][3]$$$: balance $$$= 0$$$ (valid bracket sequence) and DFA state $$$= 3$$$ (pattern confirmed).
But the problem doesn't just ask us to count these subsequences, it asks for the sum of their scores, where the score of a qualifying subsequence of length $$$\ell$$$ is $$$\ell - 2$$$. So we need to track not only how many subsequences reach each state, but also the sum of their lengths.
We add one more dimension: for each $$$(i, b, q)$$$ triplet, we store two values: - $$$dp[i][b][q][0]$$$ = the count of subsequences of $$$s[0..i-1]$$$ with balance $$$b$$$ and DFA state $$$q$$$. - $$$dp[i][b][q][1]$$$ = the sum of lengths of those subsequences.
When we extend a subsequence by taking a character, its length increases by 1. So if we have $$$\text{cnt}$$$ subsequences with total length sum $$$\text{sum}$$$, after extending all of them by one character, the new sum becomes $$$\text{sum} + \text{cnt}$$$ (each of the $$$\text{cnt}$$$ subsequences grew by 1).
At the end, the answer is:
The complexity is $$$O(n \cdot n \cdot 5) = O(n^2)$$$.
Solution code for Sub-RBS (hard version)
Final thoughts
Thanks for reading! This is my first blog, so I hope it was useful! I hope this technique becomes a useful tool in your problem-solving toolbox. The recipe is always the same: spot the regular constraint, draw the DFA, add the dimension.
If you found any mistakes, have suggestions to improve the blog, or know other problems where this technique applies, please let me know in the comments. I'd also be curious to hear if this idea has been written up before somewhere else, I'd love to link to it.



