Rofofo is a renowned professor who teaches at one of the largest universities in Brazil. In his most recent classes, Rofofo talked about computer cache, and for this, he invented a new processor model, the RodolfEx $$$2000$$$.
The RodolfEx $$$2000$$$ has $$$N$$$ cachelines, each with $$$M$$$ variables. A variable in the RodolfEx $$$2000$$$ has $$$8$$$ bits of memory, that is, it can store integers from $$$0$$$ to $$$255$$$. Therefore, when adding variables, overflow may occur. Thus, we interpret the sum of two integers $$$x$$$ and $$$y$$$ in the RodolfEx $$$2000$$$ as $$$(x+y)\%256$$$, that is, the remainder of the division of $$$x + y$$$ by $$$256$$$.
Moreover, the processor has only $$$2$$$ relevant instructions for the test:
Note that when adding $$$1$$$ to a variable with value $$$255$$$, it overflows and returns to $$$0$$$.
In his test, Rofofo gives the initial value of each variable in the RodolfEx $$$2000$$$, as well as the factors $$$A$$$ and $$$B$$$, and wants to know the minimum number of clock cycles needed so that all variables become equal. Write a program that solves Rofofo's test and scores $$$10$$$!
Due to the time limit of the problem, it is important to submit using the PyPy3 compiler for Python solutions.
The first line of input contains two integers, $$$A$$$, $$$B$$$ $$$\;(1 \leq A, B\leq 10)$$$, the number of clock cycles for each operation.
The second line of input contains two integers, $$$N$$$ and $$$M$$$ $$$\;(1 \leq N \leq 10; 1 \leq M \leq 10^4)$$$, the number of cachelines and variables per cacheline in the RodolfEx $$$2000$$$.
The next $$$N$$$ lines of input each contain $$$M$$$ integers $$$v_1, v_2, \dots, v_M \; (0 \leq v_i \leq 255)$$$, the initial values of the corresponding cacheline.
Hint: Note that in this problem $$$N$$$ is much smaller than $$$M$$$. Try to design an algorithm where the slowest part depends on $$$N$$$, not $$$M$$$!
Print a single integer: the minimum number of clock cycles needed to complete Rofofo's test.
1 22 31 2 11 1 1
4
1 22 31 2 1254 253 255
11
8 32 2109 114164 109
630
Explanation of example 1: We can perform a type $$$1$$$ operation on the first and last variable of the first cacheline, and then perform a type $$$2$$$ operation on the second cacheline. In total, the number of clock cycles for this solution is $$$(1+1) \cdot 1 + 1 \cdot 2 = 4$$$
Explanation of example 1: We can perform a type $$$1$$$ operation on the first and last variable of the first cacheline, two type $$$1$$$ operations on the first variable of the second cacheline, and one type $$$1$$$ operation on the second variable of the second cacheline. After these operations, the $$$3$$$ variables of the first cacheline are all equal to $$$2$$$, while the $$$3$$$ variables of the second cacheline are all equal to $$$255$$$. By performing a type $$$2$$$ operation on the second cacheline, an overflow occurs and the $$$3$$$ variables of the second cacheline become $$$(255+1)\%256 = 0$$$. After that, just perform two more type $$$2$$$ operations on the second cacheline.
In total, there were $$$(1+1+2+1) \cdot 1 + 3 \cdot 2 = 11$$$ clock cycles.