### oToToT's blog

By oToToT, history, 16 months ago,

TL; DR: #include <ext/random> and use __gnu_cxx::sfmt19937 just as std::mt19937.

Hi all,

I'm here to share a thing that seems not that well-known in the whole community.

There is a SIMD-oriented Fast Mersenne Twister (SFMT) implementation in GCC's library.
As its name suggested, it's a faster MT-like PRNG implementation based on SIMD.
SFMT is not exactly MT, but it shares some similarities and is much faster than the traditional MT.
If you are interested in more details about it, you could check this paper.

I test the two codes below in the Codeforces' custom test using GNU G++20 11.2.0 (64 bit, winlibs).
The result looks good.

sfmt19937 (998 ms)
mt19937 (3416 ms)

p.s. there's also __gnu_cxx::sfmt19937_64 if you want a 64-bit PRNG.

• +84

By oToToT, 5 years ago,

# 1. Use the trick of Tree SQRT-Decomposition

I didn't find any Codeforces post describing this method, so I'm here to share this method. Also, I hope people could help me learn more about tree sqrt-decomposition.

## Main Idea

The main idea of Mo's algorithm is to decompose sequence with size $N$ into $O(K)$ contiguous blocks, and if we re-order our queries $[L_i,R_i]$ with pair $(\text{BlockID}(L_i), R_i)$ and maintain two pointer of $L$ and $R$, we could get a $O(Q \frac{N}{K} + KN)$ complexity to solve some sequence problem.

It is because that every queries with the same $\text{BlockID}(L_i)$ shares the cost of moving the pointer of $R$ from leftmost to rightmost, so the pointer of $R$ will move at most $O(NK)$ elements. Also, the pointer of $L$ won't move more than $O(\frac{N}{K})$ elements when changing from one query range to another query range.

So, if we have a way to preserve these two properties ( i.e. we could share the cost of moving a pointer $R$ when those queries are in the same block, and we could correctly decompose our tree into some blocks such that moving a pointer $L$ between nodes in a single blocks won't cost to large ), we could apply a mo's algorithm on trees.

## Tree SQRT-Decomposition

Here comes a way to decompose tree into blocks such that the size of blocks will be in the range $[K, 3K]$, and also the distance between nodes inside a single block won't exceed $O(3K)$.

#### The Algorithm

We could construct every blocks recursively.

Here is a pseudo implementation of this algorithm in C++.

int BlockID[N], BlockCount = 0;

// this function will return nodes that haven't been group into blocks
vector<int> DFS(int u) {
vector<int> vertices_without_group;
for (int v : child_of[u]) {
// recursively process u's subtree
auto sub = DFS(v);
// merge nodes that haven't been group when proccesing u's subtree
for (int s : sub) vertices_without_group.push_back(s);
// when the size of block is big enough, we then make it a block
if (vertex_without_group.size() >= K) {
// mark those nodes
for (int vertex_without_group : vertices_without_group)
BlockID[vertex_without_group] = BlockCount;
BlockCount++;
vertices_without_group.clear();
}
}
// u should be grouped by it's parent
vertices_without_group.push_back(u);
return vertices_without_group;
}
void group_tree_vertices(int root) {
auto vertices_without_group = DFS(root);
// notice that we may not need to add a new block for the rest nodes
for (int vertex_without_group : vertices_without_group)
BlockID[vertex_without_group] = BlockCount;
}


#### Analysis

So, why this algorithm work ( i.e. the size of blocks will be in the range $[K, 3K]$, and the distance between nodes inside a single block won't exceed $O(3K)$ )?

Obviously, we call vertices_without_group a block if and only if it's size is greater than $K$, so the size of a single block won't exceed $K$ and the size of the return value of DFS will be less than $K$. Also, when merging two sets with size both less than $K$, the resulting set will have a size less than $2K$. This proves the size of blocks constructed in DFS will be in the range $[K, 2K]$. However, the merging step in group_tree_vertices may make the size of the last block becomes greater than $2K$, but it won't make it over $3K$. Now, we proved that the size of every block will be in the range $[K, 3K]$.

Additionally, a single block is almost-connected ( i.e. if it is constructed when dfsing $u$, then after putting $u$ into that block, the block becomes a connected component ), so the distance between nodes inside a single block won't exceed its size. Thus, the distance won't exceed $O(3K)$.

## Mo's Algorithm on Tree with Tree SQRT-Decomposition

So, how to apply mo's algorithm on trees to handle tree path problem?

We could find that if we maintain an edge set $T_u$ from root to $u$ and an edge set $T_v$ from root to $v$, then $T_u \operatorname{\triangle} T_v$ ( where $\operatorname{\triangle}$ means Symmetric difference ) will be the edge set between $u$ to $v$. Also, when we want to change our query from $(u, v)$ to $(p, q)$, we just need to transform $T_u$ to $T_u \operatorname{\triangle} ( T_u \operatorname{\triangle} T_p )$ and $T_v$ to $T_v \operatorname{\triangle} (T_v \operatorname{\triangle} T_q)$ ( because $(T_u \operatorname{\triangle} ( T_u \operatorname{\triangle} T_p )) \operatorname{\triangle} (T_v \operatorname{\triangle} (T_v \operatorname{\triangle} T_q)) = T_p \operatorname{\triangle} T_q$ ).

As implementing this algorithm, when we want to change our query from $(u,v)$ to $(p,q)$, we just need to traverse the path from $u$ to $p$ and $v$ to $q$, adding those edges into our set if they haven't been in our set, or remove them if they have been in our set.

A simple implementation in C++

But how to re-order our query?

We could sort our queries $(u_i, v_i)$ with $(\text{BlockID}(u_i), \text{dfn}(v_i))$ ( where $\text{dfn}(u)$ means the time we enter our tree in a dfs procedure ).

This preserves the properties described above because moving a pointer between a single block cost $O(K)$ and traverse the tree with $\text{dfn}$ equals to traversing the tree with dfs, and thus, it cost $O(N)$.

As a result of this algorithm, we could first sqrt-decompose our tree and sort our queries base on it, and apply a mo's algorithm on it.

A template of Mo's algorithm on tree in C++

# 2. Use The Property of Euler Tour

It has been described in Mo's Algorithm on Trees [Tutorial], so I'm not going to introduce this method thoroughly in this post.

## Main Idea

This method use the property of euler tour, so we could construct a sequence and do a modified mo's algorithm on that sequence.

• +132