# Lagrange Interpolation by grhkm

No polynomial stuff because I don't know $$$O(n\log{n})$$$ polynomial multiplication :(

## Main results:

Given $$$f(x_1)=y_1, f(x_2)=y_2, ..., f(x_n)=y_n$$$ we can calculate $$$f(X)$$$ where $$$f$$$ is the unique $$$(n-1)$$$-degree polynomial

For general {$$$x_i$$$} we can do so in $$$O(n^2)$$$

For consecutive {$$$x_i$$$} i.e. $$$x_j=x_1+(j-1)$$$, we can do so in $$$O(n)$$$ excluding binary exponentiation

## Why useful?

Calculate $$$1^n+2^n+...+X^n$$$ in $$$O(n)$$$

DP optimization

With polynomials you can do cool things but not gonna cover this blog

Let's get started :)

Assume all polynomials below are $$$(n-1)$$$ degree unless specified.

**disclaimer**

**Why (n-1) degree?**

## Idea:

To start let's deal with first problem:

Given $$$f(x_1)=y_1, f(x_2)=y_2, ..., f(x_n)=y_n$$$ and an integer $$$X$$$ we can calculate $$$f(X)$$$ where $$$f$$$ is the unique $$$(n-1)$$$-degree polynomial

Let's consider an easier version:

Find polynomial satisfying $$$f(x_1)=y_i, f(x_2)=0, ..., f(x_n)=0$$$

As we learn in school, $$$f(r)=0 \implies (x-r)$$$ is divisor

We can write this down:

Now notice that $$$f(x_1)=\prod_{i=2}^n (x_1-x_i)$$$ (a constant), but we want it to be $$$y_1$$$

We can simply scale the polynomial by a correct factor i.e.

Similarly, we can find a similar polynomial such that $$$f_i(x_i)=y_i$$$ and $$$f_i(x_j)=0 \forall j \neq i$$$

Okay now finally to answer the original question, we can notice that $$$f(x)=\sum_{i=1}^n f_i(x)$$$ satisfies the constraints. So there we have it! Lagrange interpolation:

$$$O(n^2)$$$ excluding inverse for division

## Optimization:

Let's try to optimise it now! Assume that we're given $$$f(1), f(2), \cdots, f(n)$$$, how can we calculate $$$f(X)$$$ quickly?

Notice that here, $$$x_i=i$$$ and $$$y_i=f(i)$$$. Substitute!

Let's consider the numerator and denominator separately

### Numerator

We can pre-calculate prefix and suffix product of x, then we can calculate the numerator (for each $$$i$$$) in $$$O(1)$$$

### Denominator

So we can precompute factorials (preferably their inverse directly) then $$$O(1)$$$ calculation

So now we can calculate $$$f(X)$$$ in $$$O(n)$$$

### Example 0

Find_ $$$\sum_{i=1}^N i^k$$$, $$$k\leq 1e6, N\leq 1e12$$$

Solution: Notice that the required sum will be a degree $$$(k+1)$$$ polynomial, and thus we can interpolate the answer with $$$(k+2)$$$ data points (Remember, we need $$$deg(f)+1$$$ points)

To find the data points simply calculate $$$f(0)=0, f(x)=f(x-1)+x^k$$$

Code can be found below

### Example 1

(BZOJ 3453, judge dead): Find_ $$$\sum_{i=0}^n \sum_{j=1}^{a+id} \sum_{k=1}^j k^x \mod 1234567891, x \leq 123, a, n, d \leq 123456789$$$

**Hint**

### Example 2 (ADVANCED)

(Luogu P4463 submit here): Given $$$n \leq 500, k \leq 1e9$$$, we call a sequence $$${a_n}$$$ *nice* if $$$1\leq a_i \leq k \forall i$$$ and $$$(a_i\neq a_j) \forall i\neq j$$$. Find the sum of (products of elements in the sequence) over all nice sequences. MY code below

**Hint**

**Hint2**

**Hint3**

**Hint4**

### Example 3 (ADVANCED TOO)

Problem https://www.codechef.com/problems/XETF : Find $$$\sum_{i=1,gcd(i,N)=1}^N i^k, N \leq 1e12, k \leq 256$$$. My code below

**Hint**

If you have any questions feel free to ask below

## Practice question:

CF 622F (Example 0)

https://mirror.codeforces.com/contest/622/submission/93584356

Thank you demoralizer for telling me this :)

Luogu P4463 (Example 2)

- My submission: https://www.luogu.com.cn/record/38800165

Codechef XETF (Example 3)

- My submission: https://www.codechef.com/viewsolution/38150282

Not in order of difficulty:

SPOJ ASUMHARD which is harder than EXTREME

Atcoder ARC33D (Japanese)

Unfortunately you can't get full score, but you can get up to subtask 4

Still not trivial, do first 3 subtasks would give insight

For full solution please wait for my next blog (next year)

SPOJ SOMESUMS very interesting

PE 487 Direct application

Preview for next post that's coming next year or something:

Interpolate MULTIPLE values in O(nlog^2(n))

and more

Also if you know any more problems using this technique please tell me I'll put it in and credit you too thanks

EDIT 1: sorry if the 'and more' sounds depressing I don't have enough energy :(

EDIT 2: The O(n) optimization can also be done if the data points given are $$$f(a),f(a+1),\cdots,f(a+k-1)$$$ instead. Left as exercise for the reader :)

EDIT 3: After staring at the ceiling for a few seconds I believe that $$$O(n)$$$ is achievable for all

arithmetic sequenceinput, as you can factor out the common difference and multiply/divide by the power of it directly. Left as an exercise for the reader :) I'll implement when I'm freeUPDATE: I believe for arithmetic sequence input, you can simply apply transformation g(x)=f(a+dx), then the input will be g(0), g(1), ..., g(k-1), which you can use the techniques in the blog to do. You can also directly find $$$f(x)=g(\frac{x-ad}{d})$$$ since you don't need to find the polynomial

This reminded me of a problem I set long ago where the input is an arithmetic sequence: https://www.spoj.com/problems/BTCODE_E/

Great blog!

I learnt about this for the first time when I couldn't solve 622F.

https://mirror.codeforces.com/problemset/problem/622/F

This problem is exactly what is taught in this blog.

2600

hmm

Well I guess a lot of people will solve their first 2600 today :D

Great that this blog helps

Very Very Nice Blog.. Never thought of this much simple explanation of lagrange interpolation .

Thanks a lot .. I got this thing for the first time.

Glad that this is helpful

(Tbh the explanation given is standard but I hope I explained it more detailedly and with clean code too :))

Yaa I know .. Even it seemed like very basic school maths to me , But the point is, u explained it using only basic terms without much technical terms .. and tbh i have always skipped this because of its name.

Glad that it helped :)

Nice blog! but I have a question, on the idea part, shouldn't the degree of $$$f$$$ be $$$n - 1$$$? since you only had $$$n$$$ points at the start (from $$$(x_1,y_1),(x_2,y_2),...(x_n,y_n)$$$)

Oops, you're correct. n points -> (n-1) degree, n degree -> (n + 1) points required

Just a note on spoj ASUMHARD. There are other methods with higher complexity that run faster than Lagrange interpolation for that particular problem (since k is small and number of test cases is large). For e.g. using a Newton divided difference table (O(k^2) pre-computation) with O(k) per test case runs super fast. See difference in run times here: https://www.spoj.com/status/ASUMHARD,suh_ash2008/ (only ID:26436058 uses Lagrange interpolation).

Interesting. Tbf I haven't Read them carefully so you're probably correct.

The complexity of the solution using Lagrange interpolation is actually O(k*log k) and not O(k) since the computation of f(0), f(1), f(2), ..., f(k) involves modular exponentiation. Even for the Lagrange interpolation, I've seen people do a modulo inverse (to divide by (x-i) instead of prefix and suffix products, which makes it O(k*log k). For ASUMHARD, let g[i][j] = i^j. We can write this as g[i][j] = g[i][j-1] * i and since k is small (321), this table can be pre-computed easily in O(k^2) and each test case can be answered in O(k) instead of O(k*log k). This seems to make a huge difference due to the large number of test cases. There are other methods (for e.g. http://oeis.org/A080779) which involve pre-computing the polynomial coefficients in O(k^2) or O(k^3) and answer each test case in O(k). :)

Shouldn't the k*log k be k*log p instead, if I understand properly? It is from exponentiation right?

If so, I wrote a blog that explains how to optimise to O(k+log p) lol feel free to check it out

Or maybe I'm misunderstanding what you mean

Thank you for the information though :)

No, I did mean O(k*log k) due to modular exponentiation (and not due to inverse computation). In your Example 0, you mention "to find the data points, simply calculate f(0) = 0, f(x) = f(x-1) + x^k". This involves computation of x^k for x = 0 to k, each of which is O(log k). Hence, total complexity of computing f(0), f(1), ..., f(k) is O(k*log k) and not O(k).

Three months later while showering, I realise that since $$$x^k$$$ is multiplicative, you can use a linear sieve to compute the values. You still have to compute values at prime values, but there are $$$\frac{k}{\log k}$$$ primes under k. Therefore the total complexity is $$$O(\frac{k}{\log k}\cdot \log k + k)$$$.

One more nice problem to try out if anyone wants to play around with their Lagrange interpolation code and try to optimise it: https://www.spoj.com/problems/PWSUMC/

I think $$$f_i(x)$$$ is defined wrong it should be $$$f_i(x) = y_i \prod_ {j=1, j \neq i}^{n} \frac {x - x_j} {x_i - x_j} $$$