I have a few friends who write rounds in `Python`

, and I noticed that they don't use some very simple optimizations, and the program ends up getting `TL`

. Meanwhile, if you use these constructions, in half or more cases `TL`

is removed.

I will show everything using the example code of my friend I_am_Drew from a 1413B - Новая техника. This code received `TL`

(worked longer than $$$1$$$ second).

```
for __ in range(int(input())):
n, m = list(map(int, input().split()))
kek = []
for i in range(n):
el = list(map(int, input().split()))
el.append(0)
kek.append(el)
stolb = list(map(int, input().split()))
ind = 0
for i in range(m):
if kek[0][i] in stolb:
ind = i
break
for i in range(n):
kek[i][m] = stolb.index(kek[i][ind])
for j in range(m-1):
stolb = list(map(int, input().split()))
kek.sort(key=lambda x: x[m])
for elem in kek:
elem.pop()
print(*elem)
```

First, as you know, data input and output takes quite a long time. Fortunately, this can be fixed using the `sys`

module. I usually write this way because it's the quickest fix, but of course it's not exactly code-style :)

```
from sys import stdin, stdout
input, print = stdin.readline, stdout.write
```

The `stdin.readline`

function reads a string like input, but faster. Also, if necessary, there is, for example, the `stdin.read`

function, which reads all input as a string (then you need to put `^D`

after it is completed), and others, but I usually do not use them. The output is more complicated, the `stdout.write`

function accepts only strings, and does not output a line feed or other separator after it. Therefore, you have to write as in the example below, it is also not very long to fix it, the main thing is not to forget :) After the conversions, you get this code. (Note that the input code has not changed at all, but the output at the end is quite a lot).

```
from sys import stdin, stdout
input, print = stdin.readline, stdout.write
for __ in range(int(input())):
n, m = list(map(int, input().split()))
kek = []
for i in range(n):
el = list(map(int, input().split()))
el.append(0)
kek.append(el)
stolb = list(map(int, input().split()))
ind = 0
for i in range(m):
if kek[0][i] in stolb:
ind = i
break
for i in range(n):
kek[i][m] = stolb.index(kek[i][ind])
for j in range(m - 1):
stolb = list(map(int, input().split()))
kek.sort(key=lambda x: x[m])
for elem in kek:
elem.pop()
# print(' '.join(map(str, elem)))
for q in elem:
print(str(q)+' ')
print('\n')
```

It is also known that global variables work longer than local ones, so if you put all the code (of course, without other functions) in, for example, `main`

, it will also work faster. The final version of the code looks like this:

```
from sys import stdin, stdout
input, print = stdin.readline, stdout.write
def main():
for __ in range(int(input())):
n, m = list(map(int, input().split()))
kek = [list(map(int, input().split()))+[0] for _ in range(n)]
stolb = list(map(int, input().split()))
ind = 0
for i in range(m):
if kek[0][i] in stolb:
ind = i
break
for i in range(n):
kek[i][m] = stolb.index(kek[i][ind])
for j in range(m - 1):
stolb = list(map(int, input().split()))
kek.sort(key=lambda x: x[m])
for elem in kek:
print(' '.join(map(str, elem[:-1])))
print('\n')
main()
```

Note that there were very few changes, but the program accelerated at least $$$2$$$ times and now gets `OK`

, working in $$$545$$$ milliseconds. Of course, you can come up with a lot of optimizations, but these are the main ones, and they work on most tasks and are easy to write. You should understand that, of course, this is not a panacea, and if, for example, in the task input or output $$$1$$$ number, optimization of fast input-output becomes useless. However, it comes in handy in many tasks.

Also, keep in mind that although `PyPy3`

is usually much faster than `Python3`

(for example, in this task it is $$$1.5$$$ times faster), there are situations when `Python3`

is faster, and I know problems where `Python3`

solutions get `OK`

, but `Pypy3`

didn't. This does not mean that every `TL`

needs to be forwarded to `Python3`

, collecting a fine, just keep this in mind. In my experience, `Python3`

is often faster in problems on string, but of course it's different every time.

I hope this blog will help you use `Python`

even more successfully :)

Hey, recently I learned Mo's algorithm and now I'm applying it in following questions this and this but I'm getting TLE, can you please take a look at my code and tell me where to optimise my code. Thanks, waiting for your response.

First off, the order in which you sort the queries is not Mo's ordering, are you sure you fully understood Mo's algorithm?

But in general, I'm not sure you can without significant effort. In the ABC problem we have $$$n \leqslant 5 \cdot 10^5$$$, an $$$O(n^{1.5})$$$ algorithm is somewhat questionable even in C++ (official solution is linearithmic). The SPOJ problem has more lenient constraints. But despite various optimizations, Python will always be many times slower than C++ and solutions with sqrt-complexity tend to take time close to the limit.

I did a mistake in the code, I sort the query on the basis of L and in case of tie sort on the basis of R but we need to sort on the basis of L // (block size) and in case of tie sort on the basis of R. This is my updated code