Hi guys. So, Centroid Decomposition, I never really paid attention to its runtime. That is until I was doing a problem (in an OI contest) that goes kinda like this:

** Statement: ** Given a tree of $$$N$$$ nodes ($$$N \leq 4*10^5$$$) that are not colored. There are two types of queries:

- Color a node.

- Among the nodes that are colored, print the furthest distance between two colored node.

**Time limit:** 1 second.

This problem looks like the classic and beloved E — Xenia and Tree. Therefore, I just used Centroid Decomposition without giving it much thought. And to my surprise, I got FST because my Centroid Decomposition code was too slow. And I was really confused. You know, an $$$O(N * log N)$$$ algorithm getting TLE with $$$N = 10^6$$$ sounds unlikely already, let alone $$$N = 4*10^5$$$. Not only that, it took twice as long as I would have liked.

So... Why is that? Why is centroid decomposition so slow? I mean, it's literally find centroid of a tree, remove that node and do the same for all of the subtrees, ooga booga and you're done, so I really have no clue what is holding it back, and how do you optimize Centroid Decomposition (aka what is the best way to implement Centroid Decomposition for speed without touching black magic such as SIMD and pragma)? Here is my Centroid Decomposition implementation: 237313601