Блог пользователя Zhtluo

Автор Zhtluo, история, 2 месяца назад, По-английски

For years, I have been dissuading people from using random vector in C++ unless it is necessary (like in adjacency lists, where otherwise it is very annoying to implement). The most common response I get is

BuT iT'S sO eAsY tO uSe!

Recently I finally had a prime example of why all these syntactic sugar will give you a sweet tooth. Witness Collect the Coins, a problem from the recent UCup contest.

The code we had in contest got TLE'ed:

Code

However, just doing these three things is enough for AC:

  • Move all data structure variables to global;
  • Move all closures to functions;
  • Last but not the least, stop using random vectors.
Code

Surely everyone will stop using random vectors instead of...

BuT mY tEaMmAtE wIlL hElP mE wItH tHe CoNsTaNtS!

Ugh for the love of...

  • Проголосовать: нравится
  • -86
  • Проголосовать: не нравится

»
2 месяца назад, # |
  Проголосовать: нравится +16 Проголосовать: не нравится

okay, but why is the title "stop using C++23"?

  • »
    »
    2 месяца назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    A lot of the syntactic sugar is introduced in more recent c++ builds.

»
2 месяца назад, # |
  Проголосовать: нравится +26 Проголосовать: не нравится

I don't think you have any idea what you're talking about, and as someone who missed qualification to an onsite final for not using vector, twice, I hope no one listens to you.

I submitted your exact second code to the problem and got TLE, but after changing static bool dp[maxn] to a global vector<int> dp(maxn), it gets AC. Incontrovertible evidence that vectors are faster than arrays, or is it just that your code is so close to TL that random fluctuations make a significant difference?

  • »
    »
    2 месяца назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Interesting. vector has given me only bad experiences.

    Do you mind sharing your story with vector?

    • »
      »
      »
      2 месяца назад, # ^ |
      Rev. 2   Проголосовать: нравится +18 Проголосовать: не нравится

      It has to do with array bounds mostly — in Google Code Jam 2019, I did something like this

      const int maxn = 100010;
      int a[maxn];
      

      but in the problem the limit was $$$n \le 10^6$$$: I miscounted the zeros, so I got RE. If I had added one more zero to that constant I would have made it to finals. With a vector this would not have mattered, as I would have likely done vector<int> a(n) instead of defining a maxn constant.

      In the second case, I had a global dp array, int dp[1100][1100]; but later I realized the dimensions actually needed to be 3000 instead: I changed the dimensions of the dp array but forgot to change the code that clears it, leading to WA. If I created a vector for each test case instead, it would automatically come with zeros, and I wouldn't need to worry about clearing it.

      The other huge benefit, at least to me, is bounds checking — now that the vector has exactly the needed size, it's easy to get warnings when the vector is accessed out of bounds which makes finding errors so much easier.

      • »
        »
        »
        »
        2 месяца назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        I see. Yeah that sucks in GCJ especially.

        I would probably still want my code to be fast, as I am usually the guy that tries to fit some random algo in time limit (see my previous blog as an example :) ). But I can see that argument.

  • »
    »
    2 месяца назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    BTW I tested and code with vector is 50ms slower, hence my conclusion that using vector is bad.

    • »
      »
      »
      2 месяца назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      My vector submission passed with 880ms, and the code with array got TLE, so I can claim I tested and array is at least 120ms slower.

      But like I implied, I don't think that this is what the difference really means, or that a 50ms difference would be easily measured. And even if you somehow rigorously measured such a difference, if random fluctuations like this are bigger than the difference, the difference really doesn't matter.

      • »
        »
        »
        »
        2 месяца назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        Interesting. I don't think generally servers can fluctuate over 100ms. Or maybe we should just submit multiple times in contest...

        I guess I will have to test it once I get back to my laptop.

        • »
          »
          »
          »
          »
          2 месяца назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          By the way, there's a reason I said I changed it to vector<int> dp(maxn). vector<bool> is weird and slow and you should generally never use it.

          • »
            »
            »
            »
            »
            »
            2 месяца назад, # ^ |
              Проголосовать: нравится 0 Проголосовать: не нравится

            Wut

            Yeah in contest we coded the bool thing-y...

            Hmm one more thing to test.

»
2 месяца назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

By the way, I think one other lesson to learn, is that you don't have to care much about these kinds of differences, if you are not trying to cheese a problem with a worse complexity. There are faster (complexity-wise) solutions to this problem.

  • »
    »
    2 месяца назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    I don't think it's generally possible to tell whether a solution is 0.5 tl or 2 tl...

    I am more of the 'take a good enough solution and run' type and I had much success just trying to do optimization.

»
2 месяца назад, # |
  Проголосовать: нравится +10 Проголосовать: не нравится

I think it's better to prioritize correctness over performance. It's quite rare that I TLE on some problem because of constant factor, but I want to make debugging as easy as possible especially in an ICPC setting where you might not have print debugging available if you let your teammates code while you are debugging. If I am really concerned about time limit, then I might swap, but otherwise I think its better to reduce WA potential than TLE.

  • »
    »
    2 месяца назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    I don't think there is much WA potential... After all it's only one array size thing, which you should be estimating anyway.

    • »
      »
      »
      2 месяца назад, # ^ |
        Проголосовать: нравится +10 Проголосовать: не нравится

      Having your state reset if the problem is multitest is pretty useful, it's one less thing you have to worry about. Generally I keep all of my state local unless I am precomputing some constants like factorials or sieve. But the main thing is that I think it is fine to use C++23 constructs (including using lambdas if it helps) if it helps reduce the space for errors. Maybe the optimal construct (imo) is to have some global buffer allocated and allocate a constant sized vector on top of it with a custom allocator, but it doesn't seem worth it especially if you have to type it up every time instead of including in a template.

      • »
        »
        »
        »
        2 месяца назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        Hmm. My main concern is that it reduces preventable errors (array bounds, cleanup) at the cost of unpreventable TLEs, but I guess there's argument on both sides.