Comments

cin is much slower than scanf. You can use scanf instead of cin, or just like you previously said, using some IO optimizations. You can see my submission as an example.

You can use binary search to find the cube root. The complexity is only O(nlog(xy)) Let the left pointer be 1 and the right pointer be 1000000, and it will not get TLE.