Hi guys. So, Centroid Decomposition, I never really paid attention to its runtime. That is until I was doing a problem (in an OI contest) that goes kinda like this:

** Statement: ** Given a tree of $$$N$$$ nodes ($$$N \leq 4*10^5$$$) that are not colored. There are two types of queries:

- Color a node.

- Among the nodes that are colored, print the furthest distance between two colored node.

**Time limit:** 1 second.

This problem looks like the classic and beloved E — Xenia and Tree. Therefore, I just used Centroid Decomposition without giving it much thought. And to my surprise, I got FST because my Centroid Decomposition code was too slow. And I was really confused. You know, an $$$O(N * log N)$$$ algorithm getting TLE with $$$N = 10^6$$$ sounds unlikely already, let alone $$$N = 4*10^5$$$. Not only that, it took twice as long as I would have liked.

So... Why is that? Why is centroid decomposition so slow? I mean, it's literally find centroid of a tree, remove that node and do the same for all of the subtrees, ooga booga and you're done, so I really have no clue what is holding it back, and how do you optimize Centroid Decomposition (aka what is the best way to implement Centroid Decomposition for speed without touching black magic such as SIMD and pragma)? Here is my Centroid Decomposition implementation: 237313601

Do you have a link where solutions can be submitted to that problem? It does seem surprising that centroid decomposition doesn't pass, so I'd like to submit to the judge to check if that is a judge issue or something wrong with your code (which I don't understand completely on the first read-through by the way).

Unfortunately, I cannot, since it was a private contest in a training camp, and their intended solution wasn't Centroid Decomposition. But I really wanna squeeze Centroid Decomposition through (for the science). My submission for Xenia and Tree takes 420ms, while I need 250ms for $$$N = 10^5$$$, or 1s for $$$N = 4*10^5$$$, so any help on improving the performance of my Centroid Decomposition code would be appreciated :D

Well, I can point you to how I write centroid decomposition using my submission from 2 years ago.

Looking through your submission, it seems that you're trying to keep track of too many things, most of which are not really needed (or maybe you should try submitting with 64 bit C++). The way I implement it recursively processes components that you get after removing the centroids processed so far. All of that is needed only for computing the centroid tree — once you have this, you can do divide and conquer as usual, or you can use the same dfs as the centroid tree construction, since you already do a divide and conquer there.

https://www.spoj.com/problems/QTREE5/

Looks like a similar problem. I'm not sure if $$$O(N \log^2 N)$$$ solution is intended but for sure it's quite straightforward. One can try several CD implementations and check which of them is the most optimal.

i can't remember where i saw you before

Bro used to be a plumber, cosmonaut, gamer, streamer, ... and now decided to ace Codeforces