Hi everyone!

Suppose that you need to compute some sum of a number-theoretic function that has something to do with divisors:

As it turns out, such and many similar sums can be computed with Dirichlet convolution in $$$O(n^{2/3})$$$, and in this article we will learn how.

Let $$$f(n)$$$ and $$$g(n)$$$ be two arithmetic functions. Let $$$F(n)$$$ and $$$G(n)$$$ be their prefix sums, that is

We need to compute a prefix sum of the Dirichlet convolution $$$(f * g)(n)$$$. In this article, we will consider some general methods, and show how to do so in $$$O(n^{2/3})$$$ if we can compute prefix sums of $$$F(n)$$$ and $$$G(n)$$$ in all possible values of $$$\lfloor n/k \rfloor$$$ in this complexity.

**Part 1: Fast prefix sum computation****Part 2: Dirichlet series and prime counting**

ecnerwala previously mentioned that it is possible, but did not go into much detail. There is also a blog by Nisiyama_Suzune, which covers prefix sums of Dirichlet inverses in $$$O(n^{2/3})$$$.

### Dirichlet convolution

Recall that the Dirichlet convolution of arithmetic functions $$$f(n)$$$ and $$$g(n)$$$ is defined as

Dirichlet convolution is often of interest in the context of multiplicative functions (i.e., functions such that $$$f(a) f(b) = f(ab)$$$ for $$$\gcd(a,b)=1$$$), as the result can be proven to also be a multiplicative function. Moreover, quite often functions of interest can be expressed as a Dirichlet convolution. For example, the divisor function

can be represented as a Dirichlet convolution of the constant function $$$f(n)=1$$$ and the power function $$$g(n)=n^a$$$. There are also other similar identities, for example, for the Euler's totient function $$$\varphi(n)$$$, one can notice that

This is due to the fact that $$$\varphi\left(\frac{n}{d}\right)$$$ is the number of integers $$$x$$$ such that $$$\gcd(x, \frac{n}{d})=1$$$, or, equivalently, $$$\gcd(x, n) = d$$$. This implies that $$$h(n)=n$$$ can be represented as the Dirichlet convolution of $$$f(n) = 1$$$ and $$$\varphi(n)$$$.

#### Hyperbola method

For now assume that we can compute any of $$$f$$$, $$$g$$$, $$$F$$$, $$$G$$$ in $$$O(1)$$$. We can rewrite the prefix sum of the Dirichlet convolution as

In the expression above, at least one of $$$i$$$ and $$$j$$$ must be less than $$$\sqrt n$$$. This allows us to rewrite the expression as

Thus, in this situation we can find a prefix sum of the Dirichlet convolution in $$$O(\sqrt n)$$$.

*Summation blocks in $$$ij \leq 15$$$*

Points in the green block are double-counted and should be subtracted

Points in the green block are double-counted and should be subtracted

The representation of the prefix summation, as written above, is commonly known as the Dirichlet hyperbola method. Graphically, such summation splits lattice points below the $$$ij=n$$$ hyperbola into three parts (see the picture above). Besides exact computations, this method can also be used to get asymptotic estimations for summations like the one above.

**Exercise 1**: Solve **Project Euler — #401 Sum of Squares of Divisors**: Compute the prefix sum of $$$\sigma_2(n) = \sum\limits_{d|n} d^2$$$ up to $$$n = 10^{15}$$$.

#### Choosing a better splitting point

Now, what if computing $$$F(i)$$$ and $$$G(j)$$$ requires vastly unbalanced amount of effort?

**Example**: Let's say that we want to compute to compute

the sum of squares of divisors of all divisors of $$$n$$$. It can be represented as a Dirichlet convolution of $$$f(n)=1$$$ and $$$g(n) = \sigma_2(n)$$$. For the first one, prefix sums can be computed in $$$O(1)$$$, and for the second one they can be computed in $$$O(\sqrt n)$$$, see excercise 1.

In this case, we can bound the sums by a point $$$(k,l)$$$ on the rectilinear convex hull of points under the hyperbola $$$kl=n$$$:

See the picture below for a graphical representation.

*For $$$n=15$$$, we can choose $$$k=2$$$ and $$$l=5$$$ instead of $$$k=l=3$$$*

Let's now assume that we can compute $$$F(n)$$$ in $$$n^\alpha$$$ and $$$G(n)$$$ in $$$n^\beta$$$ for $$$0 \leq \alpha, \beta < 1$$$. What is the optimal splitting pair $$$(k, l)$$$?

Note that for $$$(k, l)$$$ to be on the rectilinear convex hull of points below $$$kl=n$$$, it is necessary and sufficient to satisfy $$$kl \leq n$$$ and $$$(k+1)(l+1) > n$$$, so we may essentially assume that $$$kl \sim n$$$. What is the time complexity required to compute the full sum? It is

To minimize the total complexity, we should make the two summands equal. Substituting $$$l \sim n/k$$$, we get

In particular, for $$$\alpha=0$$$ and $$$\beta=1/2$$$ we get $$$k=n^{1/3}$$$ and $$$l=n^{2/3}$$$, making for the total complexity $$$O(n^{2/3})$$$.

#### Adding precomputation

Okay now, but what if $$$\alpha$$$ and $$$\beta$$$ are roughly the same, and also reasonably huge? For example $$$\alpha = \beta = 1/2$$$ or even $$$\alpha = \beta = 2/3$$$.

Due to symmetry, it's still optimal to split in $$$k=l=\lfloor \sqrt n \rfloor$$$, so we can't make it better by choosing a better splitting point. What we can do, however, is to precompute prefix sums of $$$f(n)$$$ and $$$g(n)$$$ up to some point $$$\sqrt n \leq t < n$$$.

It's typically possible to do so with linear sieve (see here for details) in $$$O(t)$$$, after which the complexity for the full computation becomes

Optimizing for $$$t$$$ makes first $$$2$$$ summands equal, thus $$$t=n^{1/(2-\alpha)}$$$. This leads to $$$O(n^{2/3})$$$ for $$$\alpha=1/2$$$ or $$$O(n^{3/4})$$$ for $$$\alpha=2/3$$$.

#### Adding even more precomputation

The result above is still not quite satisfactory. We see that $$$n^{2/3}$$$ is a kind of magical bound, to which a lot of things reduce. Given that, it would be very nice to stick to it, that is to somehow guarantee that if we can compute prefix sums of $$$f(n)$$$ and $$$g(n)$$$ in $$$O(n^{2/3})$$$, then we also can compute prefix sums of $$$(f*g)(n)$$$ in $$$O(n^{2/3})$$$. Turns out that it is possible to do! We just need a bit stronger guarantees.

**Claim**: Let $$$f(n)$$$ and $$$g(n)$$$ be such that $$$F\left(\lfloor n/k\rfloor\right)$$$ and $$$G\left(\lfloor n/k\rfloor\right)$$$ are known for all possible arguments. Then we can compute prefix sum $$$H(n)$$$ of $$$h(n) = (f * g)(n)$$$ in $$$O(\sqrt n)$$$. Moreover, we can find $$$H(\lfloor n/k \rfloor)$$$ for all possible arguments in $$$O(n^{2/3})$$$.

**Note 1**: There might only be at most $$$2 \sqrt n$$$ distinct values of $$$\lfloor n/k \rfloor$$$, because either $$$k$$$ or $$$\lfloor n/k \rfloor$$$ must be smaller than $$$\sqrt n$$$.

**Note 2**: The result above allows us to repeatedly compute Dirichlet convolutions without blowing the complexity, as it will pin to $$$O(n^{2/3})$$$.

**Proof**: The result about $$$H(n)$$$ immediately follows from the hyperbola method.

Additionally, using the identities $$$\small \left\lfloor \frac{\lfloor n/a \rfloor}{b} \right\rfloor = \lfloor \frac{n}{ab}\rfloor$$$ and $$$\small \left\lfloor\sqrt{\left\lfloor \frac{n}{k} \right\rfloor} \right\rfloor = \left\lfloor\sqrt{\frac{n}{k}} \right\rfloor$$$, we can compute $$$H(\lfloor n/k \rfloor)$$$ with the hyperbola method as

in $$$\small O\left(\sqrt{\frac{n}{k}}\right)$$$, as any individual summand is computed in $$$O(1)$$$ after pre-computation.

Note that $$$\lfloor \sqrt{t} \rfloor$$$ always lies in the corner of the rectilinear convex hull of points below $$$kl=t$$$. There are two cases two consider here:

- The corner is "concave", that is $$$\lfloor \sqrt t \rfloor(\lfloor \sqrt t \rfloor+1) \leq t$$$. It means that $$$\lfloor \sqrt t \rfloor = \lfloor t/l \rfloor$$$ for $$$\small \lfloor \sqrt t \rfloor < l \leq \left\lfloor \frac{t}{\lfloor \sqrt t \rfloor} \right\rfloor$$$.
- The corner is "convex", that is $$$\lfloor \sqrt t \rfloor(\lfloor \sqrt t \rfloor+1) > t$$$. It means that $$$\lfloor \sqrt t \rfloor = \lfloor t/l \rfloor$$$ for $$$\small \left\lfloor \frac{t}{\lfloor \sqrt t \rfloor+1} \right\rfloor < l \leq \lfloor \sqrt t \rfloor$$$.

**Note 3**: Graphically, on the picture above the corner $$$(3, 3)$$$ is "concave", and the corner $$$(3, 5)$$$ is "convex".

In either case, it means that $$$\lfloor \sqrt t \rfloor$$$ can always be represented as $$$\lfloor t/l \rfloor$$$ for some $$$l$$$. Applying this result to $$$\lfloor n/k \rfloor$$$ instead of $$$t$$$, we deduce that $$$\small \left\lfloor \sqrt{\frac{n}{k}} \right\rfloor$$$ can be represented as $$$\small \left\lfloor \frac{\lfloor n/k \rfloor}{l} \right\rfloor = \lfloor \frac{n}{kl}\rfloor$$$, thus the values of $$$F(n)$$$ and $$$G(n)$$$ in $$$\small \left\lfloor \sqrt{\frac{n}{k}} \right\rfloor$$$ must also be known, and do not contribute any extra burden to the computation time.

Thus, the overall complexity is the sum of $$$\small \sqrt{\frac{n}{k}}$$$ over all values of $$$\lfloor n/k\rfloor$$$, in which we compute it with the hyperbola method. Since we can pre-compute these values up to $$$n/k \sim n^{2/3}$$$ in $$$O(n^{2/3})$$$ with the linear sieve, we're only interested in values of $$$k$$$ up to $$$n^{1/3}$$$, which takes

proving the overall complexity of $$$O(n^{2/3})$$$.

### Dirichlet inverse

**Example**: **Library Judge — Sum of Totient Function**. Find the sum of $$$\varphi(n)$$$ up to $$$n \leq 10^{10}$$$.

**Solution**: Explained below, see submission for implementation details.

The result above provides us with an efficient method of computing the prefix sums of the convolution of two sequences. However, it often happens that we can compute the prefix sums of the convolution $$$(f * g)(n)$$$, and of one of the functions $$$f(n)$$$, but not of the other.

For example, consider the prefix sums of the Euler's totient function $$$\varphi(n)$$$. We already know that the Dirichlet convolution of $$$\varphi(n)$$$ and $$$f(n) = 1$$$ is $$$h(n) = n$$$. We can easily compute prefix sums of $$$f(n)$$$ and $$$h(n)$$$. But computing them for $$$\varphi(n)$$$ is much more difficult.

What we can do here is consider the Dirichlet inverse of $$$f(n)=1$$$. Dirichlet inverse of a function $$$f(n)$$$ is the function $$$f^{-1}(n)$$$ such that $$$(f * f^{-1})(n)$$$ equates to the Dirichlet neutral function, which is defined as $$$1$$$ for $$$n=1$$$ and to $$$0$$$ otherwise. As follows from its name, Dirichlet convolution of any function with the Dirichlet neutral yields the initial function.

From Möbius inversion formula, the inverse for it is known to be $$$\mu(n)$$$, the Möbius function, hence we can represent $$$\varphi(n)$$$ as

that is as the Dirichlet convolution of $$$f(n)=n$$$ and $$$\mu(n)$$$. This reduces the task of finding prefix sums for $$$\varphi(n)$$$ to finding prefix sums of the Dirichlet convolution of $$$f(n) = n$$$ and $$$\mu(n)$$$. But how do we find the prefix sums of $$$\mu(n)$$$ (also known as the Mertens function)?

**Claim**: Let $$$f(n)$$$ be such that we can find values of $$$F\left(\left\lfloor n/k\right\rfloor\right)$$$ for all possible arguments in $$$O(n^{2/3})$$$. Then we can also find the prefix sums for all possible arguments of the Dirichlet inverse $$$f^{-1}(n)$$$ in $$$O(n^{2/3})$$$.

**Proof**: Let $$$F^{-1}(n)$$$ be the prefix sum of $$$f^{-1}(n)$$$ up to $$$n$$$. The hyperbola formula allows us to compute $$$F^{-1}(n)$$$ in $$$O(\sqrt n)$$$, granted that we already know all values of $$$ F\left(\left\lfloor n/k\right\rfloor\right)$$$ and $$$ F^{-1}\left(\left\lfloor n/k\right\rfloor\right)$$$:

**Note**: For non-zero multiplicative functions, $$$f(1)=1$$$.

With this in mind, we can compute all values of $$$F^{-1}(n)$$$ up to $$$O(n^{2/3})$$$ with linear sieve, and then compute $$$ F^{-1}\left(\left\lfloor n/k\right\rfloor\right)$$$ one by one in the increasing order of arguments.

Each value is computed in $$$\small O\left(\sqrt{\frac{n}{k}}\right)$$$ starting with $$$k \approx \sqrt[3]{n}$$$ and going up to $$$k=1$$$. This gives the $$$O(n^{2/3})$$$ algorithm to find the Dirichlet inverse, and its analysis is essentially the same as with the algorithm for the Dirichlet convolution.

As a proof of concept, I have implemented the code to compute the sums of $$$\mu(n)$$$ and $$$\varphi(n)$$$ up to a given bound:

**code**

You can use $$$M(10^{12})=62366$$$ and $$$\Phi(10^{12})=303963550927059804025910$$$ to check your implementation.

**Exercise 2**: Solve **Project Euler — #625 Gcd Sum**: Compute $$$G(n) = \sum\limits_{j=1}^n \sum\limits_{i=1}^j \gcd(i, j)$$$ for $$$n=10^{11}$$$.

**Exercise 3**: Solve **Project Euler — #351 Hexagonal Orchards**.

### Further reading

**[Tutorial] Math note — linear sieve**— how to compute prefix of multiplicative functions up to $$$n$$$ in $$$O(n)$$$ with linear sieve.**[Tutorial] Math note — Möbius inversion**— how to apply Möbius inversion to some number-theoretic problems.**[Tutorial] Math note — Dirichlet convolution**— another entry on Dirichlet convolution inversion in $$$O(n^{2/3})$$$.**Summing Multiplicative Functions (Pt. 1)**— more on computing prefix sums of multiplicative functions.

**UPD**: Turns out, it is also possible to compute prefix sums in $$$\lfloor n/k \rfloor$$$ for Dirichlet convolution and Dirichlet inverse in $$$O(n^{2/3})$$$ while only using $$$O(\sqrt n)$$$ memory, and without linear sieving:

**code**

See the comment below for details.

Thanks for the blog

Also, please tell me that this is not a topic for under 3000 rated problems, because I really don't want to read all that any time in the future

the "china" in your bio disagrees with you, look inside you , you already know all that.

LMAO

"topics" over 3000 rating are 3000 rating just because people don't learn them.

Is there any CF problem based on this?

Thanks for the nice introductory blog! In my opinion, it could have been much more interesting for people who already know about the topics from the referred blogs if there was a treatment that focused on properties of Dirichlet series to explain the content of the blog.

Formal Dirichlet series over a ring form a ring under the Dirichlet convolution (for instance, when the ring is $$$\mathbb{R}$$$, it's easy to check that it is isomorphic to the ring of all formal power series in countably many variables due to unique factorization) — this is necessary to be able to do all the cool stuff mentioned in this comment.

Furthermore, Dirichlet series have Fourier-transform-esque properties, in the sense that Dirichlet series inversion resembles Fourier transforms. It is also interesting to note that L-functions are just Dirichlet series of Dirichlet characters, which are related to group characters used in Fourier transforms on finite groups — a concise resource on this can be found here. This webpage is a great resource on the connections between number theoretical topics like Dirichlet series and Fourier transforms, and anyone who likes FFT should go through some of these papers to truly appreciate the kind of theory that arises at the junction of these two seemingly different fields.

At the risk of going off a tangent, L-functions are also one of the most important objects of study for modern number theory, being closely related to elliptic curves and the Riemann hypothesis, to name a couple of famous applications — there is a lot of computational number theory that revolves around L-functions as primitives.

Here are a few resources for anyone who's interested:

Hey, thanks, I was actually planning to write about Dirichlet series initially, but I couldn't properly connect their nice theoretic framework with practical applications in programming yet. I plan to write another article about them as soon as I have enough understanding to cover this gap :)

Actually, it's possible to do all of these operations in $$$O(n^{2/3})$$$ time but with only $$$O(n^{1/2})$$$ memory, without relying on the functions being multiplicative! The main idea is that we don't actually care about all values of the linear-time sieve (the memory-intensive part, which also relies on the function being multiplicative), we only care about them bucketed into distinct values of $$$\lfloor N/k \rfloor$$$.

Directly cutting out the sieve is a little tricky, so here's an alternative viewpoint which makes the algorithm a bit simpler. Consider trying to compute $$$H(\lfloor n / z \rfloor)$$$ for all $$$z$$$, where $$$h = f * g$$$. Then, note that $$$f(x) g(y)$$$ contributes to $$$H(\lfloor n / z \rfloor)$$$ iff $$$xyz \le n$$$. Now, we claim we can partition the space $$$xyz \le n$$$ into $$$O(n^{2/3})$$$ rectangular prisms (3D rectangles). (This is analogous to how we can partition $$$xy \le n$$$ into $$$O(n^{1/2})$$$ rectangles.)

The construction is pretty simple. WLOG, assume $$$x \le y \le z$$$ (this introduces a factor of $$$6$$$). Then, fix any $$$x \le y$$$ with $$$x y^2 \le n$$$. There are at most $$$O(n^{2/3})$$$ such pairs: for each $$$x \le n^{1/3}$$$, there are $$$\lfloor \sqrt{n/x} \rfloor$$$ pairs, and $$$\sum_{x=1}^{n^{1/3}} \lfloor \sqrt{n/x} \rfloor = O(n^{2/3})$$$ by integration. Then, simply take the rectangle $$$[x, x] \times [y, y] \times [y, \lfloor n / (xy) \rfloor]$$$.

Now that we've partitioned the space of $$$xyz \le n$$$ into $$$O(n^{2/3})$$$ rectangles (in fact, each rectangle is a $$$1 \times 1 \times k$$$ line in some dimension), we can easily perform the desired sum. For a rectangle $$$[x_1, x_2] \times [y_1, y_2] \times [z_1, z_2]$$$, we can use prefix sums to compute $$$\sum_{x=x_1}^{x_2} \sum_{y=y_1}^{y_2} f(x) g(y) = (F(x_2) - F(x_1 - 1))(G(y_2) - G(y_1 - 1))$$$, and we can implicitly add it to the interval $$$H(\lfloor n / z_2 \rfloor) \ldots H(\lfloor n / z_2 \rfloor)$$$ by storing $$$H$$$ in terms of its finite differences. (Note that all coordinates of the rectangles are of the form $$$\lfloor n / k \rfloor$$$ since they're either smaller than $$$\sqrt{n}$$$ or they're clearly of that form, so we only need the inputs of the form $$$F(\lfloor n / k \rfloor)$$$ and $$$G(\lfloor n / k \rfloor)$$$.) Adding the finite differences up in the end gives us the desired $$$H$$$.

Additionally, note that we're really just performing schoolbook long multiplication, but compressing some intervals when possible. We can also use this to perform schoolbook long division (e.g. Dirichlet inverse). The setup is that, we know $$$F$$$ and $$$H$$$, and we want to compute $$$G$$$ so that $$$F * G = H$$$. Let's initialize $$$G$$$ to be "unknown", and try to compute $$$F * G = H'$$$. Since the rectangles are easy to implicitly generate, we can compute $$$H'$$$ in "online order", meaning that we find $$$H'(k)$$$ before using $$$F(l)$$$ or $$$G(l)$$$ for $$$l > k$$$ (this is easy to work out, since the rectangles are just formed from fixing 2 coordinates). Then, at each step, we should just set $$$G(k)$$$ to make $$$H'(k) = H(k)$$$ (we can find the right value since we know it contributes exactly $$$F(1) G(k)$$$). Continuing to multiply gives us the full inverse we want. This also lets us compute functions like $$$\sqrt{F}$$$ and similar.

(Sorry, I meant to write a longer blog post describing all these algorithms, but haven't found the time. You can find my code for this stuff at https://github.com/ecnerwala/cp-book/blob/master/src/dirichlet_series.hpp)

Hey, that's very cool! In theory, at least. I'm trying to implement it and it doesn't seem very pleasant so far T_T

As an additional note, this multiplication method can be modified to perform well in the case where $$$f$$$ and/or $$$g$$$ have some sparsity; skip 3D rectangles concerning $$$f(x) = 0$$$ or $$$g(y) = 0$$$. When $$$f$$$ and $$$g$$$ take values only on (say) $$$n^{1/6}$$$-rough numbers, I think it becomes $$$O(n^{2/3} \log(n)^{-1})$$$ time. When $$$f$$$ takes values only on powerful numbers, we can partition the space into $$$O(n^{1/2} \log(n))$$$ 3D rectangles (as far as I know this is less known than "powerful number sieve").

(Sorry, I meant to write a longer blog post describing all these algorithms, but have been too lazy to implement the division. Thanks for explaining how to think!)

Hi, thanks for pointing that out! Could you please share more details on where the bound of $$$O(\sqrt n \log n)$$$ comes from? Also what would be the simplest way to construct such set of rectangles?

My idea is to use only the rectangles of the form $$$[x,x] \times [y,y] \times [y, N/(xy)]$$$ and $$$[x,x] \times [z,N/(xz)] \times [z,z]$$$, where we can assume that $$$x$$$ is powerful. We want to count $$$(x, y)$$$ such that $$$x y^2 \le N$$$ and $$$x$$$ is powerful. Using the representation $$$x = a^2 b^3$$$ ($$$b$$$: square-free), we have $$$\sum_{a\le N^{1/2}} \sum_b \mu(b)^2 (N/(a^2b^3))^{1/2} \sim N^{1/2} \log(N^{1/2}) \frac{\zeta(3/2)}{\zeta(3)}$$$.

Another quick note: this construction gives the minimum number of rectangles. For any $$$x \le y \le z = \lfloor n / x / y \rfloor$$$, the point $$$(x, y, z)$$$ is "maximal" for $$$xyz \le n$$$, in that $$$n < xy(z+1) \le x(y+1) z \le (x+1) y z$$$. No two maximal points can be in the same rectangle, and each of our rectangles contains a maximal point, so we have an optimal partition.

After some further discussions with ecnerwala, and a lot of help from him, we also have a self-contained Python implementation for multiplication and division in this algorithm:

codeIn the code above, the function

`exec_on_blocks(func, n)`

enumerates all $$$[x_0, x_1] \times [y_0, y_1] \times [z_0, z_1]$$$ blocks mentioned in his comment and passes them on to its argument function`func`

.This function

`func`

behaves simply for`Dirichlet_mul(F, G, n)`

and`Dirichlet_div(H, G, n)`

.In the first case, it simply adds through finite differences

In the second case, it subtracts this value instead. Besides that, if $$$F(x_1)$$$ is unassigned at the moment, it assumes that $$$x_1 = z_0 = k$$$, such that $$$F(x)$$$ for $$$x<k$$$ are already known, and all the other blocks with $$$z_0 = k$$$ were already propagated. Therefore, $$$H(z_0)$$$ must become $$$0$$$ after subtraction, from which we can determine $$$F(x_1)$$$ as

To maintain this,

`exec_on_blocks`

iterates over $$$k$$$, after which itNote: The indices $$$x_i, y_i, z_i$$$ are actually from $$$0$$$ to $$$2 \sqrt n$$$, as they directly index distinct values of $$$\lfloor n/k \rfloor$$$.

One more quick note: the $$$xyz \le N$$$ construction looks remarkably symmetric, and there's a good reason for this.

Fundamentally, why are we trying to compute all prefix sums $$$H(\lfloor N / z \rfloor)$$$? It's really because we want to be able to answer questions of the form "what is the $$$N$$$th prefix sum of $$$h * q$$$?" for an arbitrary function $$$Q$$$; $$$H$$$ is simply the coefficients for each $$$q(z)$$$. Substituting in $$$h = f * g$$$, we're trying to find the coefficient of each $$$q(z)$$$ in the $$$N$$$th prefix sum of $$$h * q = f * g * q$$$ (which equals $$$\sum_{xyz \le N} f(x) g(y) q(z)$$$).

Said differently, computing prefix sums of $$$h = f*g$$$ is the linear algebraic "transpose" of computing $$$\text{pref}(f*g*q)(N)$$$ with respect to $$$q$$$, so we can perform an algorithm to compute this prefix sum, but keep track of coefficients of $$$q$$$ instead of accumulating them (the more general form of this "transposition principle" is described here). This is why we should consider the coordinate scheme $$$H(\lfloor N / z \rfloor)$$$: $$$z$$$ is really the coordinate in $$$q$$$, and we're trying to "sum together" over $$$xyz \le N$$$.

Last day an incredible breakthrough in this topic has been published by orzdevinwang:

BriefThe algorithm takes $$$\mathcal O (n^{1/2} \operatorname{poly\,log} n)$$$ space, though, and the time constants are big. It’s theoretical meaning is much more greater. It’s very hard to believe that the bound of soft-$$$\mathcal O(n^{2 / 3})$$$ can actually be broken, after so many years of studies by CP experts in China (previous studies aimed to reduce the number of factors of logs in soft-$$$\mathcal O(n^{2 / 3})$$$, the best result might be approaching $$$-2$$$ logs). It’s very interesting that the method is not traditionally pure combinatorial, but contains a careful analysis in floating precisions, and uses a result from analytic number theory.

Check his blog (in Chinese) for details.

Wow, that's very impressive! orzdevinwang orz!

The story on how it's affected double precision also reminds me of the current best $$$\tilde O(\sqrt n)$$$ algorithm to compute the partition function $$$p(n)$$$ in a specific point $$$n$$$. Essentially, even if we just want to compute it modulo some number $$$m$$$, the best thing we can do is to evaluate Rademacher formula with enough precision to get the exact value as an integer, and then compute it modulo $$$m$$$.

I'm not sure I understand all of the details of the algorithm that you describe, but maybe I'll take a closer look later...

P.S. I wish these results were better exposed in English speaking resources T_T

This is pretty cool. It looks like this uses analytical bounds to prove runtime, but is still combinatorial with respect to the coefficients, so it can work over an arbitrary ring, is that correct?

Also, do you know how it compares to analytical approaches like this paper? I thought they might be a little similar, using logs is kind of like writing the evaluation of the Dirichlet series in fixed-point in base $$$B$$$.