Introduction
Many communication problems involve sending information from one function to another by sending $$$0$$$s and $$$1$$$s (binary) or some number smaller than $$$b$$$ (base $$$b$$$). In these problems, we often need to change the information that we want to send from one base to another.
This can be particularly tricky when the information that we want to send is a sequence of numbers. A common way to do so is to send each number one by one using $$$\lceil \log_b (k) \rceil$$$ digits where $$$b$$$ is the base we can send information in and the elements of the sequence are between $$$0$$$ and $$$k - 1$$$. The problem with this is that if we are sending $$$l$$$ numbers, we are possibly wasting some digits as $$$l\lceil \log_b (k) \rceil \ge \lceil l\log_b (k) \rceil$$$.
In this blog, I will be demonstrating a way to encode $$$n$$$ sequences $$$a_0, a_1, \ldots, a_{n - 1}$$$, each sequence $$$a_i$$$ consisting of $$$l_i$$$ non-negative integers $$$0 \le a_{i, 0}, a_{i, 1}, \ldots, a_{i, l_i - 1} < k_i$$$. The sequences will be encoded into a single sequence $$$x$$$ of length $$$m$$$ consisting of non-negative integers $$$0 \le x_0, x_1, \ldots, x_{m - 1} < b$$$, where $$$m = \lceil\sum_{i=0}^{n - 1} l_i\log_b k_i\rceil$$$. Conveniently, the inverse operation to decode sequence $$$x$$$ back into $$$n$$$ sequences $$$a_0, a_1, \ldots, a_{n - 1}$$$ follows a very similar structure which I will show below.
Algorithm
The main idea behind the encoding and decoding algorithm is based on doing repeated modulo and division operations to change a number from one base to another. Suppose we want to change a number $$$x$$$ from base $$$a$$$ to base $$$b$$$. The least significant digit will equal the remainder of $$$x$$$ after dividing by $$$b$$$ under base $$$a$$$. Then, we replace $$$x$$$ by the floor division of $$$x$$$ by $$$b$$$ under base $$$a$$$, and find the remainder again to get the next digit. We can do this repeatedly until $$$x$$$ reaches $$$0$$$ to get $$$x$$$ in base $$$b$$$.
This algorithm works even if we have multiple sequences, each having different bases. We just need to make sure to divide under the correct base for each of the sequences. The code for encoding is as follows:
vector<int> encode(int b, vector<int> k, vector<vector<int>> a) {
assert(k.size() == a.size());
int n = k.size();
ld bits = 0;
for (int i = 0; i < n; i++) {
bits += a[i].size() * (log(k[i]) / log(b));
}
int m = ceil(bits);
vector<int> x(m);
for (int p = 0; p < m; p++) {
int rem = 0;
for (int i = n - 1; i >= 0; i--) {
for (int j = a[i].size() - 1; j >= 0; j--) {
rem = rem * k[i] + a[i][j];
a[i][j] = rem / b;
rem %= b;
}
}
x[p] = rem;
}
return x;
}
The decoding algorithm also works similarly, but we have to make sure to divide using the different bases for the different sequences.
vector<vector<int>> decode(int b, vector<int> k, vector<int> l, vector<int> x) {
assert(k.size() == l.size());
int n = k.size(), m = x.size();
vector<vector<int>> a(n);
for (int i = 0; i < n; i++) {
a[i].resize(l[i]);
for (int j = 0; j < l[i]; j++) {
int rem = 0;
for (int p = m - 1; p >= 0; p--) {
rem = rem * b + x[p];
x[p] = rem / k[i];
rem %= k[i];
}
a[i][j] = rem;
}
}
return a;
}