Блог пользователя conqueror_of_tourist

Автор conqueror_of_tourist, история, 3 года назад, По-английски

In the recent Goodbye 2021 Round on Problem E my code seems to run a lot slower when running on pypy64 as opposed to regular pypy.

The following submissions are identical, with the exception being the language used.

141113239 (pypy3.7 — AC 732ms)

141185703 (pypy3.8 64bit — TLE)

Running on Custom Invocation on one testcase of size $$$10^5$$$ shows that pypy3.7 will take about 250ms whereas pypy3.8-64 will take 5600ms.

Profiling my code it seems like the culprit is my find function

#Finds smallest index of segtree such that seg[ind] <= x
#Only works if func = min
def find(self, x):
    curr = 1
    
    if self.data[curr] > x:
        return -1
    while curr < self._size:
        assert self.data[curr] <= x
        if self.data[2 * curr] <= x:
            curr = 2 * curr
        else:
            curr = 2 * curr + 1
        assert self.data[curr] <= x
    return curr - self._size

Replacing this function with a trivial (but incorrect) function makes the submission run in ~230 ms in both languages. However it seems like this function should run in $$$O(log n)$$$ and not be 20x slower than the rest of my code (and if itis slow, why is it not an issue in 32bit pypy?). If anyone can figure out the issue here (and what I need to avoid to be able to run code in pypy64 quickly) it would be greatly appreciated.

  • Проголосовать: нравится
  • +82
  • Проголосовать: не нравится

»
3 года назад, # |
  Проголосовать: нравится +48 Проголосовать: не нравится

I've also noticed that 64 bit PyPy on CF sometimes runs super slow. Like for example here

PyPy3 64 bit 1949 ms 140113619

PyPy3 32 bit 374 ms 141188674

I've also had many people messaging me about weird TLEs with 64 bit PyPy:

https://mirror.codeforces.com/blog/entry/98385?#comment-872293

https://mirror.codeforces.com/blog/entry/98385?#comment-872290

Something definitely seems buggy with 64 bit PyPy on CF. But I haven't yet had the time to figure out what is causing it or if I can even replicate the TLEs locally.

  • »
    »
    3 года назад, # ^ |
    Rev. 2   Проголосовать: нравится +97 Проголосовать: не нравится

    An update, I found a really really good killer

    for i in range(10**7):
        if sum([]):
            break
    

    Using custom invocation on Codeforces, it runs in 4.5 s in 64 bit PyPy3 and 61 ms in 32 bit PyPy3.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +55 Проголосовать: не нравится

      I've opened up an issue about this on PyPy's website https://foss.heptapod.net/pypy/pypy/-/issues/3629 .

      It seems that when PyPy made the jump from Python3.7 to Python3.8 they introduced some kind of bug. pypy3.8-v7.3.7 has the bug, but pypy3.7-v7.3.7 doesn't. A quick fix for CF would be to switch from pypy3.8-v7.3.7-win64 to pypy3.7-v7.3.7-win64. The problem is not related to 32 bit vs 64 bit, nor is it a windows vs linux thing.

      One fun thing to note is that adding empty for loops everywhere seems to fix it. For example

      for i in range(10**7):
          for _ in range(1): pass
          if sum([]):
              break
      

      runs fast no matter PyPy version.

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

This is why I don't use 64-bit on contests :P

»
3 года назад, # |
Rev. 2   Проголосовать: нравится +87 Проголосовать: не нравится

I tested it a bit in custom invocation with random input. If you comment out the asserts in the while loop, it runs in 650ms, otherwise 6.5 seconds. Funny thing is that you can replace it with an assert True or if 0 > 1: break and the same bug will still occur (but lowered to ~5seconds).

So my theory is that loop got unrolled but the additional branches blew up the number of possible code paths. I think told pajenegod about an similar issue before for pyrival's RMQ in 117322863. The solution there was to add a no-op for _ in range(1): pass to prevent inlining (and indeed the same trick seems to work for unrolling too). Explanation from the dev about why this happens can be found in this issue. Some background on how jit guards/bridges work can be found at: https://youtu.be/NQfpHQII2cU?t=1389

Anyway this is occurring often enough that you should report it to https://foss.heptapod.net/groups/pypy/-/issues. The devs are fairly responsive and have fixed other jit bugs reported by codeforce users before. You should probably try to reduce it to a minimal reproducible example first though.

Though I can't read them, you can also output jit logs with something like PYPYLOG=jit-log-opt,jit-summary,jit-backend:logfile pypy3.8 bug.py.

Summary without that assert line:

Tracing:      	81	0.070865
Backend:      	81	0.029770
TOTAL:      		0.422342
ops:             	116056
heapcached ops:  	129132
recorded ops:    	35992
  calls:         	3995
guards:          	7006
opt ops:         	20512
opt guards:      	3911
opt guards shared:	2224
forcings:        	0
abort: trace too long:	0
abort: compiling:	0
abort: vable escape:	0
abort: bad loop: 	0
abort: force quasi-immut:	0
abort: segmenting trace:	0
nvirtuals:       	2512
nvholes:         	825
nvreused:        	625
vecopt tried:    	0
vecopt success:  	0
Total # of loops:	30
Total # of bridges:	52
Freed # of loops:	0
Freed # of bridges:	0

Summary with the assert:

Tracing:      	672	1.453424
Backend:      	671	0.186138
TOTAL:      		3.310882
ops:             	3332025
heapcached ops:  	1164403
recorded ops:    	318645
  calls:         	111855
guards:          	68843
opt ops:         	338865
opt guards:      	43555
opt guards shared:	39539
forcings:        	0
abort: trace too long:	1
abort: compiling:	0
abort: vable escape:	0
abort: bad loop: 	0
abort: force quasi-immut:	0
abort: segmenting trace:	0
nvirtuals:       	2999
nvholes:         	768
nvreused:        	611
vecopt tried:    	0
vecopt success:  	0
Total # of loops:	29
Total # of bridges:	643
Freed # of loops:	10
Freed # of bridges:	19
»
2 года назад, # |
  Проголосовать: нравится +28 Проголосовать: не нравится

I updated PyPy3-64 to 7.3.9 (3.9.10). I will be glad if you could test this version.

»
2 года назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

new pypy3 64 sub pypy3 sub do not know the issue of this random TLE