Блог пользователя imposter_syndrome

Автор imposter_syndrome, история, 6 лет назад, По-английски

Hi Everyone, i have been seeing this for some days now on codeforces. In yesterday's educational round, I submitted problem D with https://mirror.codeforces.com/contest/1107/submission/49013323 solution. Later I realized that it takes minimum 5040*5040*60 operations in worst case. After contest, i tried hacking my solution but it passed successfully. Also I think testcase 20 is having same worst case input (all 0's and last rows last element if say 1), but it passes there in 732 ms. In my local env it takes more than 4s for this to pass.

Also in CF#534, I tried hacking this O(n^2) solution https://mirror.codeforces.com/contest/1104/submission/48743647, which was doing more than 10^9 operation on worst case (ababa....aa..bababababa string). But it passed in 640 ms. From my understanding it takes 1s for approx 10^8 operations.

Has codeforces upgraded there testing servers or I am missing something here?

  • Проголосовать: нравится
  • +52
  • Проголосовать: не нравится

»
6 лет назад, # |
Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

I don't know about your code( didn't even opened it ) but I used bitset, which divides the operations by 32. And btw in worse case it takes 5200*5200*48( not 60 ).

  • »
    »
    6 лет назад, # ^ |
      Проголосовать: нравится +16 Проголосовать: не нравится

    5040 has 60 divisors. It takes 1524096000 number of operations (+ some other computations) :) While 5200 has 30 divisors and it takes 811200000 operations. Worst case is 5040 from my understanding.

    • »
      »
      »
      6 лет назад, # ^ |
      Rev. 2   Проголосовать: нравится -70 Проголосовать: не нравится

      OK. Anyway, you don't need to post a blog for such things!!!

      • »
        »
        »
        »
        6 лет назад, # ^ |
          Проголосовать: нравится +26 Проголосовать: не нравится

        Ohh. sorry didn't know exact way to ask this. I thought other people may also be seeing this and it would be good to ask if something is missing. Sorry for all the inconveniences it caused to you

        • »
          »
          »
          »
          »
          6 лет назад, # ^ |
            Проголосовать: нравится -53 Проголосовать: не нравится

          I don't care, but as I know they'll downvote, maybe a comment somewhere on tutorial or announcement could be enough...

          • »
            »
            »
            »
            »
            »
            6 лет назад, # ^ |
            Rev. 2   Проголосовать: нравится -27 Проголосовать: не нравится

            Seems they just downvote me... Why always me?? What have I done to you?? OK downvote, I love downvotes!!!

»
6 лет назад, # |
  Проголосовать: нравится +5 Проголосовать: не нравится
  1. Also I think testcase 20 is having same worst case input (all 0's and last rows last element if say 1), but it passes there in 732 ms. You are using fast input, without it : 49042714 (732ms -> 1528ms)
  2. In my local env it takes more than 4s for this to pass. It depend on the power of your computer
  3. From my understanding it takes 1s for approx 10^8 operations. The time limit is 2.5s so i think 10^9 is fine after all ?
»
6 лет назад, # |
Rev. 2   Проголосовать: нравится +19 Проголосовать: не нравится

I'm also interested in such question.

What has happened with C++ STL string?

Why solutions where string length is about 300 000 characters and which erase small peace (of 2 characters) in the center of the string while it is not empty passes in 1 second on Codeforces?

Isn't there about 3*1.5*10^10 operations? Or string is not a continuous memory block now, it is like an amazing "list" with indexation of elements?

Examle of 2 different problems and their solutions that I can't hacked and even final tests can't do it too:

1) https://mirror.codeforces.com/contest/990/submission/39115847 (passed in 1 second for 300 000 string!)

2) https://mirror.codeforces.com/contest/1104/submission/48742260 (passed in 0.078 second for 100 000 string!!!)

Are you kidding me? :))

What has happened?

  • »
    »
    6 лет назад, # ^ |
    Rev. 9   Проголосовать: нравится +25 Проголосовать: не нравится

    You forgot to divide number of operations by sizeof(YMM-register) / sizeof(char) const for a string. So, erasing from std::string takes n / sizeof(YMM-register) * sizeof(char) = n / 256 * 8 = n / 32 operations. Estimate is n*(n+1)/2 / 32 = 300000^2 / 64 = 1.4 * 10^9 operations. Most costly operation in this case is loading / storing data to / from registers. It is 1s = 10^9 in aligned by cache line size case and 2s = 10^9 in unaligned case on codeforces with sequential access and cache prefetching.

    • »
      »
      »
      6 лет назад, # ^ |
        Проголосовать: нравится +1 Проголосовать: не нравится

      So, memmove() can process 256 bits in one processor clock cycle?

      • »
        »
        »
        »
        6 лет назад, # ^ |
          Проголосовать: нравится +26 Проголосовать: не нравится

        Yes, it is.

        As additional, you can see AVX, AVX2 and FMA in my solution for gym problem G. Underpalindromity. With n = 200000 and k = 100000 it is n * k / 2 / sizeof(YMM-register) * sizeof(int) = 200000 * 100000 / 8 = 2.5 * 10^9 operations, I think.

        If submission not available for you, this is source code

        • »
          »
          »
          »
          »
          6 лет назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          OMG, problem setters, let our restrictions for string/array length be 1e6 or 1e7 to avoid passing solutions with N^2 and architecture level optimizations.

          • »
            »
            »
            »
            »
            »
            6 лет назад, # ^ |
            Rev. 2   Проголосовать: нравится +2 Проголосовать: не нравится

            For example, you can't increase n and k in problem G. Underpalindromity, because there are solutions with correct algorithm in O(n log(n)) with working time 0.7s-1s.

            Learning algorithms and data structures — one way, learning architecture level's optimizations — another way. When you are using array instead list and iterate over it, you already using architecture level optimizations. Both knowledge can be combined, for example, you can use sqrt-decomposition with such optimizations and sqrt-list and it will be faster, than segment tree and cartesian tree.

        • »
          »
          »
          »
          »
          6 лет назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          A bit simpler and faster solution: 49057315

          • »
            »
            »
            »
            »
            »
            6 лет назад, # ^ |
              Проголосовать: нравится 0 Проголосовать: не нравится

            Fantastic, speed up in 2 times and gcc can vectorize even with addition to int64_t!

            If you are interested, there are another problem 101341I - Matrix God "Matrix God" with 1000 x 1000 matrix multiplication by modulo 10^9+7 and fantastic solution 45798437 by Constantine Drozdov in 374 ms