alan-c's blog

By alan-c, 4 months ago, In English

A modular integer template implements basic integer operations that wrap around when they reach some modulus. There are many ways to do this, but some ways are slower. This article compares the performance of different methods.

Summary

For the best performance:

  • Addition of two modular integers should be done without modulo operations.
  • Any modulus operation (using %) should be done using unsigned integer types.

Speed comparison, extensions, and other operations are discussed later.

Basic template

For simplicity, I'll use a fixed prime modulus $$$M$$$. If you need multiple moduli at the same time then you can extend this using a c++ template parameter. This is the simplest and easiest to implement, but it's slower.

using ll = long long;

struct BasicMint {
  int n;

  BasicMint(ll m = 0) : n(m % M) {}

  BasicMint operator+(BasicMint m) { return BasicMint(n + m.n); }
  BasicMint operator*(BasicMint m) { return BasicMint((ll)n * m.n); }
};

Better template

The first optimization is that we can avoid doing a modulo operation in an addition operation. This is because the sum of two modular integers is between $$$0$$$ and $$$2M-1$$$, so a simple check and subtraction by $$$M$$$ suffices to achieve wrapping addition. A modulo operation is costly in terms of execution time.

struct BetterMint {
  int n;

  BetterMint(ll m = 0) : n(m % M) {}

  BetterMint operator+(BetterMint m) {
    if ((m.n += n) >= M)
      m.n -= M;
    return m;
  }
  BetterMint operator*(BetterMint m) { return BetterMint((ll)n * m.n); }
};

Fastest template

Another optimization is that we can use unsigned integers in the constructor. I don't know why, but modulo operations are faster for unsigned integers. This speeds up multiplications. Also, add constexpr to the constructor to make some optimizations at compile time.

using ull = unsigned long long;

struct FastMint {
  int n;

  constexpr FastMint(ull m = 0) : n(m % M) {}

  FastMint operator+(FastMint m) {
    if ((m.n += n) >= M)
      m.n -= M;
    return m;
  }
  FastMint operator*(FastMint m) { return FastMint((ull)n * m.n); }
};

Comparison

Running these locally on my computer yields the following times averaged over 10 trials. I compiled using g++ -O2 with 64 bit stdc++17.

Note: If you compile using -fsanitize=address,undefined then results are slower and the template implementations will be nowhere near in speed to the manual implementations.

Implementation Execution time (s)
Basic 1.083
Better 0.601
Fast 0.504
Manual (see below) 0.474
Manual + unsigned 0.418

The following code used was to test each implementation with many modular integer operations.

#include <bits/stdc++.h>
using namespace std;

// insert implementation here

using Mint = FastMint; // choose which implementation to test

const int N = 1e8, M = 1e9+7;
// brute force compute a_{n+1} = a_n^2 + 3 * a_n + 5
Mint a[N];

int main() {
  a[0] = 1;
  for (int i = 0; i < N - 1; i++)
    a[i + 1] = a[i] * a[i] + a[i] * 3 + 5;
  cout << a[N - 1].n << '\n';
}

I also tested manual implementation without structs: initialize the array using int a[N] and change the line in the loop to

a[i + 1] = ((ll)a[i] * a[i] + a[i] * 3 + 5) % M;

With unsigned integers (change ll to ull), the manual implementation is even faster.

Why is manual implementation faster?

You are able to tell when you can compute the whole expression in a single long long without overflow, and do a single modulus operation at the end. Using the above example code, manually we can perform 2 normal multiplications and 2 additions with one modulo operation because we see the whole expression will not overflow. However, a modular integer template would need to perform 2 modular multiplications and 2 modular additions.

Extensions

Currently, I have shown the constructor, addition, and multiplication. To handle negative numbers, implement unary operator-. Subtraction is very similar to addition, and you can avoid a costly modulus operation in the same way using 1 if statement. Division is done using modular inverses: $$$a/b \equiv a \cdot \operatorname{inv}(b)$$$. The inverse of $$$b$$$ can be calculated in $$$O(\log M)$$$ time using binary exponentiation by FLT: $$$\operatorname{inv}(b) \equiv b^{M-2}$$$.

For practical usage, you would name the struct something like Mint instead of FastMint. Below is a sample implementation of the basic +,-,*. There are lots of other operators you could implement including ++, ==, >> and << for IO, etc. Personally, I implement only what I'll use so I usually skip implementing things like += or *= or binary exponentiation unless I need them.

struct Mint {
  int n;

  constexpr Mint(ull m = 0) : n(m % M) {}

  Mint &operator+=(Mint m) {
    if ((n += m.n) >= M)
      n -= M;
    return *this;
  }
  Mint operator+(Mint m) { return m += *this; }

  Mint &operator*=(Mint m) {
    n = (ull)n * m.n % M;
    return *this;
  }
  Mint operator*(Mint m) { return m *= *this; }

  Mint &operator-=(Mint m) {
    if ((n -= m.n) < 0)
      n += M;
    return *this;
  }
  friend Mint operator-(Mint a, Mint b) { return a -= b; };
  Mint operator-() { return Mint() - *this; };
};

Full text and comments »

  • Vote: I like it
  • +10
  • Vote: I do not like it