This is a blog starting from the very basics of number theory, in a way that flows fluidly from one concept to another and is based in developing an intuitive feeling for the basics of elementary number theory. This is not a blog to simply gloss over. I consider more of a guided exploration into the world of discovering things in the world of number theory, and I don't expect anyone to immediately understand all the insights in this blog. But if you put an honest effort into discovering how I find these insights you will find much use for my blog.
If you do not know some notation or some elementary theorem I use you should refer to this.
We start with the basic definition that is at the heart of number theory. Let $$$a \mid b$$$ (read a divide(s) b) for some $$$a,b \in \mathbb{Z}$$$ for some if there exists $$$k \in \mathbb{Z}$$$ such that $$$b = ak$$$. Similarly $$$a \not\mid b$$$ if there does not exist such $$$k$$$.
If $$$g \mid a$$$, then $$$g \mid ab$$$.
If $$$gx \mid ax$$$ then $$$g \mid a$$$.
If $$$g \mid a$$$ and $$$g \mid b$$$ then for every $$$(x,y)$$$ $$$g \mid ax + by$$$.
Furthermore, you should assume all variables defined in this blog from here on are integers unless mentioned otherwise.
For all numbers $$$k,n$$$, there exists unique $$$q,r$$$ such that $$$n = kq + r$$$ where $$$0 \le r < |k|$$$. This is the elementary theorem of division.
$$$a \mod k$$$ creates an equivalence relation, where $$$a \equiv b \mod k$$$ if for some $$$(q_1,q_2,r)$$$, $$$a = q_1k + r$$$ and $$$b = q_2k + r$$$. Notice that this is equivalent to $$$k \mid (a - b)$$$
This function is at the heart of most elementary number theory. $$$\gcd(x,y)$$$ is defined as the largest $$$g$$$ such that $$$g \mid x$$$ and $$$g \mid y$$$.
What does this function tell you?
If $$$\gcd(x,y) = 1$$$, then in a sense you can tell that $$$y$$$ does not help a number divide something by $$$x$$$ because in a sense there is no part of $$$g$$$ of $$$x$$$ it can help divide. This is why its also referred to as $$$x$$$ is coprime or relatively prime to $$$y$$$. Try formalizing this definition to see if you've understood it.
Let $$$a \mid bc$$$ and $$$\gcd(a,c) = 1$$$. Then $$$a \mid b$$$.
You can also generalise this theorem as
Let $$$a \mid bc$$$ and $$$\gcd(a,c) = g$$$. Then $$$a/g \mid b$$$.
Let us now look at the additive structure of some number mod $$$n$$$. Let $$$1 \le a \le n$$$ such that $$$\gcd(a,n) = 1$$$. Let us analyse the function $$$f(x) = ax \mod n$$$ for $$$1 \le x \le n$$$. We know that for all $$$(1 \le x < n)$$$ $$$n \not\mid ax$$$ from the previous theorem.
Therefore we will reach $$$0$$$ only when $$$x = n$$$. Moreover $$$ax_1 \not\equiv ax_2 \mod n$$$ for $$$x_1 \not= x_2$$$ because $$$n \not\mid a(x_1 - x_2)$$$ since $$$n \not\mid x_1 - x_2$$$ and $$$\gcd(a,n) = 1$$$.
So all values in $$$f(x)$$$ for $$$1 \le x \le n$$$ are distinct and therefore must cover every single remainder(residue) mod n.
Let us now consider $$$a$$$ where the $$$\gcd$$$ isn't $$$1$$$. Let $$$\gcd(a,n) = g$$$. Let $$$a_0 = a/g, n_0 = n/g$$$. Then $$$\gcd(a_0, n_0) = 1$$$. We want to analyse the function $$$f(x) = ax \mod n$$$. However this is very closely related to the coprime case as $$$f(x) = g \times (a_0x \mod n_0)$$$. Can you see why this is?
In essence, the inner function is the same repeating pattern over all residues mod $$$n_0$$$, where as the outer part just tells us that its looping these residues over the multiples of $$$g \mod n$$$. In a sense having multiples of a number that is not coprime to $$$n$$$ is akin to partially multiplying by $$$0$$$. Once the $$$g \mid n$$$ is multiplied in the residue it is going to stay. You cannot undo it.
What does this insight on addition help us prove?
One of the most insightful benefits we get from this is that for some $$$a,b$$$ where $$$\gcd(a,b) = 1$$$, there exists $$$x,y$$$ such that $$$ax + by = 1$$$. This cannot be solved by normal algebra. Let us think of what existence of $$$x,y$$$ really tell us. Essentially there exists $$$x$$$ such that the difference $$$ax - 1$$$ is a multiple of $$$b$$$, as $$$y$$$ is in our control. This just means $$$ax \equiv 1 \mod b$$$. This must exist as $$$a$$$ is coprime to $$$b$$$ and we proved that $$$ax \mod b$$$ is a function that essentially loops around all residues mod $$$b$$$ in a different order. This is known as Bezout's Theorem.
Let us now talk about the computation of $$$\gcd$$$ and valid values of $$$x,y$$$.
Let us show $$$\gcd(a,b) = \gcd(a, b - ka)$$$ for all $$$k$$$.
Let us look at some $$$g \mid a, g \mid b$$$. Then $$$g \mid b - ka$$$.
Let us look at some $$$g \mid a, g \not\mid b$$$. Then $$$g \not\mid b - ka$$$, as $$$a$$$ is multiple of $$$g$$$. So subtracting $$$a$$$ is irrelevant.
I think the implementation is beyond the scope of this blog, however using the above fact one can come up with an $$$O(\log{n})$$$ algorithm.
Earlier we proved that for each $$$a,b$$$ where $$$\gcd(a,b) = 1$$$ there exists $$$x$$$ such that $$$ax \equiv 1 \mod b$$$. $$$x$$$ is known as the multiplicative inverse of $$$a \mod b$$$ and is written as $$$a^{-1}$$$.
Instead of looking at the repeated additions of $$$a \mod n$$$, let us look at repeated multiplication. Let $$$f(x) = a^x \mod n$$$. Let us again start with the coprime case. We know that if $$$a$$$ is coprime to $$$n$$$ then $$$a^x$$$ is also coprime to $$$n$$$. This can be proved with a previous theorem. Then really $$$a^x$$$ can only visit elements that are coprime to $$$n$$$.
This set of elements is called a "Reduced residue system(RRS)". The number of elements in the RRS is defined by a function $$$\phi(n)$$$. This is just the count of $$$1 \le x \le n$$$ such that $$$\gcd(x,n) = 1$$$.
Let us make a directed graph with $$$V = RRS$$$ and $$$E = (a,ax)\ \forall\ a \in V$$$. Let us now analyse this graph. As we have proved $$$x$$$ has a multiplicative inverse already, each element has exactly one incoming edge and one outgoing edge. This forms a set of cycles in $$$V$$$. Let $$$C$$$ be the size of a cycle. Then $$$a^C \equiv 1 \mod n$$$, therefore all cycles are of size $$$C$$$. If there are $$$k$$$ cycles, then $$$Ck = |V| = \phi(n)$$$. Therefore $$$a^{\phi(n)} \equiv a^{kC} \equiv 1 \mod n$$$. This is known as Fermat's little Theorem.
Let us assume we have some linear equation we want to solve such as $$$ax \equiv c \mod n$$$. This is generally simple, first we need to split $$$a = ga_0$$$ where $$$g = \gcd(a, n)$$$. We proved earlier that this equation is equivalent to $$$a_0x \equiv c/g \mod n/g$$$. We also proved that a number $$$a$$$ with a $$$\gcd$$$ of $$$g$$$ only visits residues that are also multiples of $$$g$$$. Therefore if $$$g \not\mid c$$$ there exists no solution. We also know that $$$\gcd(a/g, n/g) = 1$$$, therefore $$$a_0$$$ has a inverse. We can now find $$$x \equiv a_0^{-1}c/g \mod n/g$$$. Therefore if a solution exists, there are $$$g$$$ solutions $$$\mod n$$$, and $$$0$$$ otherwise.
Let us now talk about merging two linear equations. We know $$$x \equiv a \mod n_1$$$ and $$$x \equiv a \mod n_2$$$. Where $$$n_1, n_2$$$ are coprime. We will show that this is equivalent to $$$x \equiv a \mod n_1n_2$$$. It is clear that if $$$n_1n_2 \mid (x - a)$$$ then $$$n_1 \mid (x - a)$$$. Therefore we prove the latter implies the former. Let us assume $$$x \not= a \mod n_1n_2$$$. This will lead to a contradiction. Since $$$a \equiv x \mod n_1$$$ and $$$a \equiv x \mod n_2$$$. $$$(a - x)$$$ is a common multiple of $$$n_1$$$ and $$$n_2$$$. Let us now define $$$lcm(n_1,n_2)$$$ to be the smallest $$$l$$$ such that $$$n_1 \mid l$$$ and $$$n_2 \mid l$$$.
Let us show that all common multiples of $$$l_1, l_2$$$ are multiples of $$$l$$$. Let us assume there was some $$$l_1,l_2$$$, both $$$l_1,l_2$$$ are common multiples of $$$n_1,n_2$$$. Moreover $$$l_1$$$ is the least common multiple and $$$l_2$$$ is not a multiple of $$$l_1$$$.
Then by the division algorithm we deduce there exists $$$l_2 = kl_1 + r$$$ where $$$0 < r < l_1$$$. Since $$$l_1,l_2$$$ are common multiples of $$$n_1,n_2$$$, so is $$$r$$$. However $$$r$$$ is non zero. Therefore $$$l_1$$$ cannot be the smallest common multiple.
Therefore $$$lcm(n_1,n_2) = \frac{n_1n_2}{g}$$$ for some $$$g$$$. We know that $$$k_1n_1 = \frac{n_1n_2}{g}$$$ and $$$k_2n_2 = \frac{n_1n_2}{g}$$$. Therefore $$$g \mid n_1$$$ and $$$g \mid n_2$$$. However they are coprime. Therefore only such $$$g$$$ is $$$1$$$. and $$$lcm(n_1,n_2) = n_1n_2$$$. Using this you could furthermore prove that $$$lcm(x,y) = \frac{xy}{\gcd(x,y)}$$$
Using this we have proved $$$lcm(n_1,n_2) = n_1n_2 \mid (a - x)$$$. This implies $$$a \equiv x \mod n_1n_2$$$ which is a contradiction. Therefore it is sufficient and necessary and therefore equivalent condition.
However we may not always have equal $$$a$$$. But, we can make them equal.
Let $$$x \equiv a_1 \mod n_1$$$ and $$$x \equiv a_2 \mod n_2$$$. We need to find $$$k_1,k_2$$$ such that $$$k_1n_1 + a_1 = k_2n_2 + a_2$$$. This is equivalent to $$$k_1n_1 - k_2n_2 = (a_2 - a_1)$$$. We can find such $$$k_1,k_2$$$ using Bezout's theorem. but just multiplying the coefficients by $$$a_2 - a_1$$$.
Let us now define a prime number. A prime number $$$p$$$ is such that there is no $$$2 \le x < p$$$ such that $$$x | p$$$. Therefore we always split non prime numbers into smaller numbers.
Therefore we can know that there exists some set of numbers $$$e_1, e_2, \ldots$$$ such that $$$n = \prod p_i^{e_i}$$$, where $$$\prod$$$ is the product symbol and $$$p_i$$$ is the $$$i$$$ th largest prime.
If all $$$e_i$$$ are integers, then $$$n$$$ is rational. If all $$$e_i \ge 0$$$ then $$$n$$$ is an integer. Therefore we can now define divisibility differently.
Let $$$a = \prod p_i^{a_i}$$$ and $$$b = \prod p_i^{b_i}$$$. Then $$$a \mid b$$$ is equivalent to $$$a_i \le b_i$$$ for all $$$i$$$. Using this definition it is obvious to see $$$\gcd(a,b) = \prod p_i^{\min(a_i, b_i)}$$$ and $$$lcm(x,y) = \prod p_i^{\max(a_i, b_i)}$$$. Now you can easily see why their product is $$$ab$$$.
There are few more properties of prime that are important however their proof is beyond the scope of this blog. Let $$$\pi(n)$$$ be the prime counting function, which tells you the number of primes less than $$$n$$$. Then
Let $$$\pi_{x,m}(n)$$$ tell us the number of primes that are less than $$$n$$$ and are $$$x \mod m$$$ for some $$$x$$$ such that $$$\gcd(x,m) = 1$$$ then
Essentially the primes are equally distributed among every element in the RRS of some $$$n$$$.
Let us now try to solve CRT for the case where $$$n_1, n_2$$$ are not coprime. This is slightly harder and may not always admit a solution.
One strategy would be to break $$$n_1$$$ and $$$n_2$$$ into their individual prime factors, and then merge those while looking for contradicting information. However this requires us to factor a number which is slow. Let us try to solve a faster merge.
Let us show that the solution to $$$x \equiv a \mod n_1$$$ and $$$x \equiv a \mod n_2$$$ is $$$x \equiv a \mod lcm(n_1, n_2)$$$. Clearly $$$x \equiv a \mod lcm(n_1, n_2)$$$ implies the other 2. Let's solve it in the other direction now. If $$$x \not\equiv a \mod lcm(n_1, n_2)$$$. Since $$$x \equiv a \mod n_1$$$ and $$$x \equiv a \mod n_2$$$ then $$$n_1 \mid (x - a)$$$ and $$$n_2 \mid (x - a)$$$ and therefore $$$lcm(n_1, n_2) \mid (x - a)$$$. Therefore $$$x \equiv a \mod lcm(n_1, n_2)$$$ which is a contradiction. This proves necessity and sufficiency similarly.
Now let us check if we can have equal $$$a$$$. Let $$$x \equiv a_1 \mod n_1$$$ and $$$x \equiv a_2 \mod n_2$$$. Then we need $$$k_1n_1 + a_1 = k_2n_2 + a_2$$$ which means $$$k_1n_1 + k_2n_2 = a_2 - a_1$$$. This equation can be solved by Bezout's Theorem if $$$\gcd(n_1, n_2) \mid a_2 - a_1$$$. Essentially modulo the common factor $$$g$$$ they must be equal.
A multiplicative function $$$f$$$ is one that satisfies $$$f(mn) = f(m)f(n)$$$ for coprime $$$m,n$$$ and is defined on all $$$\mathbb{N}$$$. Examples of these function are $$$\phi(n)$$$ the number of coprime residues mod $$$n$$$, $$$\sigma(n)$$$, the sum of divisors of $$$n$$$, $$$\tau(n)$$$ the number of divisors mod $$$n$$$. To get the value of a multiplicative function, we only need to know the values at $$$p^k$$$ for some $$$p$$$ and $$$k$$$, and take the product over all prime factors of $$$n$$$. Let us now define something more interesting. Let the convolution of two functions be $$$(f * g)(n) = \sum_{d \mid n} f(d)g(n/d)$$$.
Let us now prove $$$(f * g)(mn) = (f * g)(m) \times (f * g)(n)$$$ for coprime $$$m,n$$$, Or the convolution of multiplicative functions is multiplicative. Let us look at all $$$d \mid mn$$$. Since they are coprime there is unique $$$d_m, d_n$$$ such that $$$d_md_n = d$$$ and $$$d_m \mid m$$$ and $$$d_n \mid n$$$. Essentially split the prime factors from of $$$d$$$ based on whether they come from $$$m$$$ or $$$n$$$.
Then we can notice that $$$f(d)g(mn/d) = f(d_m)g(m/d_m)f(d_n)g(n/d_n)$$$. Since we need to add over all $$$d$$$, or all pairs $$$d_m, d_n$$$, this is just the product at $$$n$$$ and $$$m$$$.
Let us also notice that $$$(f * g * h)$$$ is commutative and associative and $$$(f * g * h) = \sum_{d_1d_2d_3 = n} f(d_1)g(d_2)h(d_3)$$$.
We can use this to solve $$$f(n) = \sum_{d \mid n} \phi(d)$$$.
$$$f(p^k) = 1 + (p - 1) + p(p-1) \ldots p^{k-1}(p - 1) = p^k$$$. Therefore $$$f(n) = n$$$.
This convolution is the basis of mobius inversion. Let $$$f(n) = \sum_{d \mid n} g(n)$$$. This is essentially $$$f = (g * 1)$$$ where $$$1(n) = 1$$$. Let's say we want to write $$$g$$$ as a sum of $$$f$$$. We need some inverse of $$$1(n)$$$ that gives us a unit element. Let us think about what the unit element of the convolution is.
Let the unit be $$$\delta$$$. Then $$$\delta(1) = 1$$$ and $$$\delta(n) = 0$$$ otherwise so the only non zero term in $$$(f * \delta)(n)$$$ is $$$f(n)$$$. We will need the inverse of $$$1(n)$$$ next.
Let the inverse be $$$\mu$$$. We need $$$(1 * \mu) = \delta$$$. Since we just need to look at prime powers we need $$$(1 * \mu)(1) = 1$$$ and $$$(1 * \mu)(p^k) = 0$$$. It is clear that $$$\mu(1) = 1$$$ and $$$\mu(p) = -1$$$, so that the sum adds up to $$$0$$$ for $$$\mu(p)$$$. Furthermore all higher powers should be $$$0$$$ so it remains $$$0$$$. Therefore we get $$$\mu(1) = 1$$$, $$$\mu(p) = -1$$$ and $$$\mu(p^k) = 0$$$.
If we use this to solve for all $$$n$$$, $$$\mu(n)$$$ is $$$0$$$ if some $$$p^2 \mid n$$$, $$$-1$$$ if an odd number of primes divide $$$n$$$, and $$$+1$$$ if an even number of primes divide it. Now if we multiply both sides by $$$\mu$$$, we get $$$(f * \mu) = (g * \delta) = g$$$. This allows to use $$$f$$$ to solve for $$$g$$$.
Let us now go back to the multiplicative properties of residues $$$\mod n$$$. Each element had a cycle length that was equal to the smallest $$$k$$$ such that $$$a^k \equiv 1 \mod m$$$. This is concisely written as $$$k = ord_m(a)$$$. If we had a number $$$g$$$ such that $$$ord_m(g) = \phi(n)$$$, $$$g^x$$$ would iterate over all residues in RRS of $$$n$$$. Can we find such a number?
The answer is as for all things, sometimes. Let us prove that it exists for some odd prime $$$p$$$. Let us first notice that $$$ord_m(a) = C_a \mid \phi(p^k)$$$. Let us look at $$$x^{d} - 1 \equiv 0$$$. where $$$d | \phi(p)$$$ This is true for all elements with $$$ord_m(a) | d$$$ and such an equation has at most $$$d$$$ roots with prime modulus. Lets show that even if all equations have $$$d$$$ roots we will have elements whose order is $$$\phi(p)$$$. Let $$$f(d)$$$ be number of elements with an order of exactly $$$d$$$. Then $$$d = \sum_{k | d} f(d)$$$. However we've proven that this requires $$$f(d)$$$ for all divisors of $$$\phi(p)$$$ to equal $$$\phi(d)$$$. Therefore the number of elements with order $$$\phi(p)$$$ is $$$\phi(\phi(p))$$$.
Using this we can convert equations of the form $$$x^t = g^s \mod p$$$ to $$$t\log_{g}(x) \equiv s \mod \phi(p)$$$.
You can prove existence of primitive roots for elements of the form $$$p^k, 2p^k$$$ using the primitive root for some odd prime $$$p$$$. I will leave this as an exercise.
If you think about it, there is a reason there are exactly $$$\phi(d)$$$ elements with cycle length of $$$d$$$. Essentially, all elements can be written as $$$g^a$$$. Therefore $$$xy \equiv g^ag^b \equiv g^{a+b} \mod n$$$. So we have converted product $$$\mod n$$$ into a sum $$$\mod \phi(n)$$$. Now the cycle length of an element $$$g^t$$$ with some $$$\gcd(t, \phi(n))$$$ is $$$\frac{\phi(n)}{\gcd(t, \phi(n)}$$$. The number of elements with $$$\gcd(t, \phi(n)) = \frac{\phi(n)}{d}$$$ is $$$\phi(d)$$$. This also gives us another proof for $$$\sum_{d \mid n} \phi(n) = n$$$.
Unfortunately there cannot exist $$$g$$$ for some element with $$$2$$$ odd prime factors as $$$g$$$ will have order $$$\phi(p)$$$ and $$$\phi(q)$$$ in the split modulus on $$$p$$$ and $$$q$$$. This means that their periods will collide as they're both even. Also there does not exist primitive root for $$$2^k$$$ for $$$k \ge 3$$$.
Let $$$p$$$ be a prime. One of the basic properties of a prime is that $$$a^{p-1}$$$ is $$$1$$$. If that is violated it is clearly not prime. Otherwise it is considered base $$$a$$$ pseudoprime. However it is possible that some composite numbers also satisfy this property. These are known as carmichael numbers. For a number to carmichael it must be square free and for each prime $$$p \mid n$$$, $$$p - 1 \mid n - 1$$$, so that $$$n - 1$$$ is a multiple of each prime cycle, so at $$$n - 1$$$ all primes in $$$n$$$ give us $$$1$$$.
We can also prove that as long as you divide $$$p - 1$$$ by $$$2$$$, $$$a^{\frac{p-1}{2^k}}$$$ should become $$$-1$$$ or remain $$$1$$$ if it was $$$1$$$ before. Only root of $$$x^2 - 1 \equiv (x - 1)(x + 1) \equiv 0 \mod p$$$ is $$$x \equiv 1,-1 \mod p$$$. If a number passes this test as well its considered a base $$$a$$$ strong pseudoprime. Using this there are no composite numbers that pass all base $$$a$$$ strong pseudoprime test and is much safer to use.
You can randomly select bases to check and get an arbitrarily small probability of a false positive. This is called the Miller-Rabin test.
Auto comment: topic has been updated by Everule (previous revision, new revision, compare).
Really nice blog; it does a great job considering the amount of material it covers. Hoping to see similar stuff for other topics too.
Thanks for the corrections, I've updated all of those, and added some more natural proofs to make it easier to understand now. Also the Chebyshev bias is new to me, I just assumed it makes sense for all the prime count to be independent of which element of the RRS it is part of.
I would write another wall of text for another topic, However I don't know anything I could write this much about, most other concepts are quite bounded in how much you can explain about them. If you or anybody else have an idea I'll be happy to explain it in this form.
Just one more correction: the claim about $$$p^k$$$ having a primitive root doesn't hold for $$$p = 2, k = 3$$$ (it holds for odd $$$p$$$, and the only powers of 2 having a primitive root are $$$2, 4$$$).