**Translated by ChatGPT 5.5 Instant.**↵
↵
Barrett Reduction can usually solve the following problem faster:↵
↵
> For a fixed modulus $m$, repeatedly given parameters $x$, quickly compute the value of $x \bmod m$.↵
↵
Direct modulo operations, such as `x % m` in C++, are usually relatively slow, so we consider constant-factor optimization. The Barrett Reduction algorithm first preprocesses an approximation of (\frac{1}{m}).↵
↵
Choose a precision parameter $k$. On a 64-bit machine, $k=64$ is commonly used. Let $B=2^k$. Precompute↵
↵
$$↵
\mu=\left\lceil \frac{B}{m} \right\rceil↵
$$↵
↵
Then we have↵
↵
$$↵
\frac{x}{m}\approx \frac{\mu x}{B}↵
$$↵
↵
So the quotient $q$ can be estimated as↵
↵
$$↵
q=\left\lfloor \frac{\mu x}{B} \right\rfloor↵
$$↵
↵
Then the remainder is clearly↵
↵
$$↵
r=x-qm↵
$$↵
↵
Since the quotient $q$ is only an estimate, the remainder $r$ also needs some correction. See the code implementation for the details.↵
↵
```cpp↵
struct Barrett {↵
unsigned int mod;↵
unsigned long long im;↵
inline Barrett(unsigned int m = 1) : mod(m), im((unsigned long long)(-1) / m + 1) {}↵
inline unsigned int reduce(unsigned long long x) {↵
unsigned long long q = (unsigned long long)((__uint128_t)x * im >> 64);↵
unsigned int r = x - q * mod;↵
if (r >= mod) r += mod;↵
return r;↵
}↵
inline unsigned int mul(unsigned int a, unsigned int b) {↵
return reduce((unsigned long long)a * b);↵
}↵
};↵
```↵
↵
Usage:↵
↵
```cpp↵
Barrett bt(mod); // mod is the modulus↵
```↵
↵
Calling `bt.reduce(a)` gives the value of `a % mod`, and calling `bt.mul(a, b)` gives the value of `a * b % mod`.↵
↵
From the analysis above, we can see that Barrett Reduction does not require `mod` to be prime; it only requires the modulus to be a positive integer. However, this 64-bit ACL-style implementation relies on 32-bit unsigned overflow correction, so it usually requires$ $1 \le \text{mod} < 2^{31}$. Within this range, `bt.reduce(x)` can quickly compute `x % mod`, and `bt.mul(a, b)` can quickly compute `a * b % mod`.↵
↵
Barrett Reduction can usually solve the following problem faster:↵
↵
> For a fixed modulus $m$, repeatedly given parameters $x$, quickly compute the value of $x \bmod m$.↵
↵
Direct modulo operations, such as `x % m` in C++, are usually relatively slow, so we consider constant-factor optimization. The Barrett Reduction algorithm first preprocesses an approximation of (\frac{1}{m}).↵
↵
Choose a precision parameter $k$. On a 64-bit machine, $k=64$ is commonly used. Let $B=2^k$. Precompute↵
↵
$$↵
\mu=\left\lceil \frac{B}{m} \right\rceil↵
$$↵
↵
Then we have↵
↵
$$↵
\frac{x}{m}\approx \frac{\mu x}{B}↵
$$↵
↵
So the quotient $q$ can be estimated as↵
↵
$$↵
q=\left\lfloor \frac{\mu x}{B} \right\rfloor↵
$$↵
↵
Then the remainder is clearly↵
↵
$$↵
r=x-qm↵
$$↵
↵
Since the quotient $q$ is only an estimate, the remainder $r$ also needs some correction. See the code implementation for the details.↵
↵
```cpp↵
struct Barrett {↵
unsigned int mod;↵
unsigned long long im;↵
inline Barrett(unsigned int m = 1) : mod(m), im((unsigned long long)(-1) / m + 1) {}↵
inline unsigned int reduce(unsigned long long x) {↵
unsigned long long q = (unsigned long long)((__uint128_t)x * im >> 64);↵
unsigned int r = x - q * mod;↵
if (r >= mod) r += mod;↵
return r;↵
}↵
inline unsigned int mul(unsigned int a, unsigned int b) {↵
return reduce((unsigned long long)a * b);↵
}↵
};↵
```↵
↵
Usage:↵
↵
```cpp↵
Barrett bt(mod); // mod is the modulus↵
```↵
↵
Calling `bt.reduce(a)` gives the value of `a % mod`, and calling `bt.mul(a, b)` gives the value of `a * b % mod`.↵
↵
From the analysis above, we can see that Barrett Reduction does not require `mod` to be prime; it only requires the modulus to be a positive integer. However, this 64-bit ACL-style implementation relies on 32-bit unsigned overflow correction, so it usually requires



