A modular integer template implements basic integer operations that wrap around when they reach some modulus. There are many ways to do this, but some ways are slower. This article compares the performance of different methods.
Summary
For the best performance:
- Addition of two modular integers should be done without modulo operations.
- Any modulus operation (using %) should be done using unsigned integer types.
Speed comparison, extensions, and other operations are discussed later.
Basic template
For simplicity, I'll use a fixed prime modulus $$$M$$$. If you need multiple moduli at the same time then you can extend this using a c++ template parameter. This is the simplest and easiest to implement, but it's slower.
using ll = long long;
struct BasicMint {
int n;
BasicMint(ll m = 0) : n(m % M) {}
BasicMint operator+(BasicMint m) { return BasicMint(n + m.n); }
BasicMint operator*(BasicMint m) { return BasicMint((ll)n * m.n); }
};
Better template
The first optimization is that we can avoid doing a modulo operation in an addition operation. This is because the sum of two modular integers is between $$$0$$$ and $$$2M-1$$$, so a simple check and subtraction by $$$M$$$ suffices to achieve wrapping addition. A modulo operation is costly in terms of execution time.
struct BetterMint {
int n;
BetterMint(ll m = 0) : n(m % M) {}
BetterMint operator+(BetterMint m) {
if ((m.n += n) >= M)
m.n -= M;
return m;
}
BetterMint operator*(BetterMint m) { return BetterMint((ll)n * m.n); }
};
Fastest template
Another optimization is that we can use unsigned integers in the constructor. I don't know why, but modulo operations are faster for unsigned integers. This speeds up multiplications. Also, add constexpr to the constructor to make some optimizations at compile time.
using ull = unsigned long long;
struct FastMint {
int n;
constexpr FastMint(ull m = 0) : n(m % M) {}
FastMint operator+(FastMint m) {
if ((m.n += n) >= M)
m.n -= M;
return m;
}
FastMint operator*(FastMint m) { return FastMint((ull)n * m.n); }
};
Comparison
Running these locally on my computer yields the following times averaged over 10 trials. I compiled using g++ -O2 with 64 bit stdc++17.
Note: If you compile using -fsanitize=address,undefined then results are slower and the template implementations will be nowhere near in speed to the manual implementations.
| Implementation | Execution time (s) |
|---|---|
| Basic | 1.083 |
| Better | 0.601 |
| Fast | 0.504 |
| Manual (see below) | 0.474 |
| Manual + unsigned | 0.418 |
The following code used was to test each implementation with many modular integer operations.
#include <bits/stdc++.h>
using namespace std;
// insert implementation here
using Mint = FastMint; // choose which implementation to test
const int N = 1e8, M = 1e9+7;
// brute force compute a_{n+1} = a_n^2 + 3 * a_n + 5
Mint a[N];
int main() {
a[0] = 1;
for (int i = 0; i < N - 1; i++)
a[i + 1] = a[i] * a[i] + a[i] * 3 + 5;
cout << a[N - 1].n << '\n';
}
I also tested manual implementation without structs: initialize the array using int a[N] and change the line in the loop to
a[i + 1] = ((ll)a[i] * a[i] + a[i] * 3 + 5) % M;
With unsigned integers (change ll to ull), the manual implementation is even faster.
Why is manual implementation faster?
You are able to tell when you can compute the whole expression in a single long long without overflow, and do a single modulus operation at the end. Using the above example code, manually we can perform 2 normal multiplications and 2 additions with one modulo operation because we see the whole expression will not overflow. However, a modular integer template would need to perform 2 modular multiplications and 2 modular additions.
Extensions
Currently, I have shown the constructor, addition, and multiplication. To handle negative numbers, implement unary operator-. Subtraction is very similar to addition, and you can avoid a costly modulus operation in the same way using 1 if statement. Division is done using modular inverses: $$$a/b \equiv a \cdot \operatorname{inv}(b)$$$. The inverse of $$$b$$$ can be calculated in $$$O(\log M)$$$ time using binary exponentiation by FLT: $$$\operatorname{inv}(b) \equiv b^{M-2}$$$.
For practical usage, you would name the struct something like Mint instead of FastMint. Below is a sample implementation of the basic +,-,*. There are lots of other operators you could implement including ++, ==, >> and << for IO, etc. Personally, I implement only what I'll use so I usually skip implementing things like += or *= or binary exponentiation unless I need them.
struct Mint {
int n;
constexpr Mint(ull m = 0) : n(m % M) {}
Mint &operator+=(Mint m) {
if ((n += m.n) >= M)
n -= M;
return *this;
}
Mint operator+(Mint m) { return m += *this; }
Mint &operator*=(Mint m) {
n = (ull)n * m.n % M;
return *this;
}
Mint operator*(Mint m) { return m *= *this; }
Mint &operator-=(Mint m) {
if ((n -= m.n) < 0)
n += M;
return *this;
}
friend Mint operator-(Mint a, Mint b) { return a -= b; };
Mint operator-() { return Mint() - *this; };
};







