Hi guys, in this problem: CSES: Maximum Building II, the constraint for $$$n$$$, $$$m$$$ was $$$1000$$$, which sounds fishy. Therefore, I tried the trivial $$$O(n^3)$$$ brute, and surprisingly it AC with the slowest testcase being 0.9s (I guess it's because the operations was so simple and predictable that the compiler and CPU just do some hocus pocus and speed it up like 10 times).

So... why did the author decide to set the constraint that low? You can solve that problem in $$$O(n^2)$$$, with good constant, so it's not like you have to be extremely generous when it comes to time limit to let these solutions pass, so what is the motive behind it? Was the $$$O(n^3)$$$ brute the intended algorithm? (which doesn't really make sense, since it's supposed to be a hard problem????)

What's the solution in $$$O(n ^ 2)$$$?

For every rectangular box, there will be a square on the bottom row which has the lowest upper-extensibility, let's call the leftmost such square the bottleneck of the rectangular box. The extensibility of each square can be computed with a simple DP.

Consider the one to many mapping from bottlenecks to all the rectangular boxes. To compute this, we need the positions of the closest element to the left (that has $$$\le$$$ the extensibility of the current square) and the closest element to the right (the same but with $$$<$$$).

All that is left is to compute the contribution of each square, and people might have many ways to do it, but here's a very clean way using generating functions (which also essentially shows that prefix sums are a special case of division by a polynomial, in this specific case, $$$1-x$$$):

For a given index with upper-extensibility $$$h$$$ and the left position $$$l$$$ blocks away and the right position $$$r$$$ blocks away, the contribution to the 2D generating function of the answer is $$$xy(1 + x + \dots + x^{l-1})(1 + x + \dots + x^{r-1})(1 + y + \dots + y^{h-1})$$$ which is just $$$\frac{xy}{(1-x)^2(1-y)}(1-x^r)(1-x^l)(1-y^h)$$$.

So we just keep adding contributions in constant time per cell, and divide by $$$(1-x)^2(1-y)$$$ in the end, which is 2 prefix sums along one dimension and one along the other.

I find it surprising too, given how the CSES TLs are sometimes too strict for intended solutions implemented in Python. Maybe they decided to be more lenient towards other languages a bit, I guess?

Edit: the $$$O(n^3)$$$ brute force doesn't seem very obvious to me, can someone point out if it is more obvious than the solution in my comment before this except that we compute the number of rectangular boxes by doing range min naively?

You can bruteforce left and right columns of the box, and then it reduces to a 1D problem (with rows with nonzero number of '*'s being replaced by '*' in the 1D problem) which can be solved in $$$O(n)$$$.

It's the intended solution, but you don't do the stack and the prefix sum part. Why and how did this got AC? I don't know, the compiler optimization is insane I guess.