Fast and short Python IO 2024

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

№	Пользователь	Вклад
1	cry	166
2	maomao90	163
2	Um_nik	163
4	atcoder_official	161
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	nor	153
9	Dominater069	153

I've been going through the CSES Problemset with the goal to write shortest (but not obfuscated) Python solutions.

Due to smaller competition pool, I managed to accomplish this on a variety of tasks. But cutting characters sometimes introduces undesirable slowdowns, mostly in IO.

The problem inputs with $$$2\cdot 10^5$$$ to $$$4\cdot10^5$$$ integers are big enough to require third or half of the execution time just to read/print and very often result in TLE. In that case it is important to be sure you are reading as quickly as possible. In C++ writing the same algorithms works well even with inefficient IO.

I noticed io.Bytes does not help at all and adds too many characters. Printing gets as fast as possible (<1% of exec time) with

print(' '.join(map(str, S)))

where S is a list of answers. Having multiple calls to print or print(*S) will be too slow.

Similarly, calling input or sys.stdin.* repetitively also isn't optimal. What I usually do is:

n, m, q, *e = map(int, open(0).read().split())

# example of handling unpacking
E = [(a, b) for a, b in zip(*[iter(e)]*2)]

This is fast, but the snippet below is much faster:

# fast variant
e = [*map(int, open(0).read().split()]
n, m, q = e[:3]
E = [(a, b) for a, b in zip(*[iter(e[3:])]*2)]

The expansion *e, = map(int, open(0).read().split()) takes ~30% more to execute (yet it saves a single character).

A good recent example is on Distance Queries where doing the fast variant gives a lot of room to not TLE: Optimized IO

Or Counting Paths where I couldn't get it to pass without the fast variant. Adding code optimizations to already short code in most cases does not help. Short code, if not heavily obfuscated, means that most of the useful invariants in the solution are fully exploited. Examples of where you might think you're slow when trying to write short programs:

doing binary lifting 18 times every query vs doing it only depth[a].bit_length() times will improve exec time but compared to IO it is insignificant. To support depth[a].bit_length() you have to extend the DFS to maintain this data and gains in exec time soon disappear,
list[(int, int)] as your DFS queue is slower than list[int], using two Q.pop() operations to get all of the metadata is faster, takes more characters but still nothing compared to IO,
Q.append(x) is faster but not shorter than Q += x, yet the speedup is negligible in the grand scheme of things,
A = L.copy() vs *A, = L

Of course, there might be a case where choosing the optimal language constructs is necessary but I haven't stumbled upon such case yet.

Optimized IO to make the problem not TLE

Rev.	Кто	Когда	Δ	Комментарий
en4	rhymehatch	2024-01-16 16:21:48	8	Tiny change: '10^5$ to $4\cdot10^5$ integers' -> '10^5$ to $10^6$ integers'
en3	rhymehatch	2024-01-16 15:24:48	36
en2	rhymehatch	2024-01-16 12:55:23	33	Tiny change: 'ast, but to get some of the problems pass the snipp' -> 'ast, but the snipp'
en1	rhymehatch	2024-01-16 12:49:13	3001	Initial revision (published)

Rev.

Язык

Кто

Когда

Комментарий

en4