This is obvious but easy to overlook. If you use something like ios_base::sync_with_stdio(0); cin.tie(NULL); for fast printing, it does not affect the cerr output stream. Hence, if you print too many debug statements to cerr, you might TLE for seemingly no reason.
Reference: https://mirror.codeforces.com/contest/2128/submission/331216699, https://mirror.codeforces.com/contest/2128/submission/331215429
EDIT — clog is faster and serves a similar function, though it is unflushed and hence might not produce output if the program doesn't end normally.
https://mirror.codeforces.com/contest/2128/submission/331238365








Makes sense. Fastio is designed to not flush output buffer for every print statement and untie from cin, but cerr flushes at the end of every << operator.
Does cerr.tie(NULL) do anything?
.
From my light amount of testing, it did seem to improve output speed by quite a bit, you should try printing like 5e5 numbers to cerr and see results.