K. Rofofo's Test
time limit per test
1.5 seconds
memory limit per test
256 megabytes
input
standard input
output
standard output

Rofofo is a renowned professor who teaches at one of the largest universities in Brazil. In his most recent classes, Rofofo talked about computer cache, and for this, he invented a new processor model, the RodolfEx $$$2000$$$.

The RodolfEx $$$2000$$$ has $$$N$$$ cachelines, each with $$$M$$$ variables. A variable in the RodolfEx $$$2000$$$ has $$$8$$$ bits of memory, that is, it can store integers from $$$0$$$ to $$$255$$$. Therefore, when adding variables, overflow may occur. Thus, we interpret the sum of two integers $$$x$$$ and $$$y$$$ in the RodolfEx $$$2000$$$ as $$$(x+y)\%256$$$, that is, the remainder of the division of $$$x + y$$$ by $$$256$$$.

Moreover, the processor has only $$$2$$$ relevant instructions for the test:

  1. Add $$$1$$$ to a specific variable in $$$A$$$ clock cycles.
  2. Add $$$1$$$ to all variables in a cacheline in $$$B$$$ clock cycles.

Note that when adding $$$1$$$ to a variable with value $$$255$$$, it overflows and returns to $$$0$$$.

In his test, Rofofo gives the initial value of each variable in the RodolfEx $$$2000$$$, as well as the factors $$$A$$$ and $$$B$$$, and wants to know the minimum number of clock cycles needed so that all variables become equal. Write a program that solves Rofofo's test and scores $$$10$$$!

Due to the time limit of the problem, it is important to submit using the PyPy3 compiler for Python solutions.

Input

The first line of input contains two integers, $$$A$$$, $$$B$$$ $$$\;(1 \leq A, B\leq 10)$$$, the number of clock cycles for each operation.

The second line of input contains two integers, $$$N$$$ and $$$M$$$ $$$\;(1 \leq N \leq 10; 1 \leq M \leq 10^4)$$$, the number of cachelines and variables per cacheline in the RodolfEx $$$2000$$$.

The next $$$N$$$ lines of input each contain $$$M$$$ integers $$$v_1, v_2, \dots, v_M \; (0 \leq v_i \leq 255)$$$, the initial values of the corresponding cacheline.

Hint: Note that in this problem $$$N$$$ is much smaller than $$$M$$$. Try to design an algorithm where the slowest part depends on $$$N$$$, not $$$M$$$!

Output

Print a single integer: the minimum number of clock cycles needed to complete Rofofo's test.

Examples
Input
1 2
2 3
1 2 1
1 1 1
Output
4
Input
1 2
2 3
1 2 1
254 253 255
Output
11
Input
8 3
2 2
109 114
164 109
Output
630
Note

Explanation of example 1: We can perform a type $$$1$$$ operation on the first and last variable of the first cacheline, and then perform a type $$$2$$$ operation on the second cacheline. In total, the number of clock cycles for this solution is $$$(1+1) \cdot 1 + 1 \cdot 2 = 4$$$

Explanation of example 1: We can perform a type $$$1$$$ operation on the first and last variable of the first cacheline, two type $$$1$$$ operations on the first variable of the second cacheline, and one type $$$1$$$ operation on the second variable of the second cacheline. After these operations, the $$$3$$$ variables of the first cacheline are all equal to $$$2$$$, while the $$$3$$$ variables of the second cacheline are all equal to $$$255$$$. By performing a type $$$2$$$ operation on the second cacheline, an overflow occurs and the $$$3$$$ variables of the second cacheline become $$$(255+1)\%256 = 0$$$. After that, just perform two more type $$$2$$$ operations on the second cacheline.

In total, there were $$$(1+1+2+1) \cdot 1 + 3 \cdot 2 = 11$$$ clock cycles.