A little thougt about how to encode tuples of natural numbers with prime numbers

Posted on 2014 03 21 by phimuemue

I have always been interested in how to encode certain things into simple natural numbers. This post shows a way to encode sets and tuples of natural numbers, that I personally find quite nice. Let \(p(i)\) be the i-th prime (i.e. p(1)=2, p(2)=3, p(3)=5, and so on), let \(p^{-1}\) be the inverse of \(p\). Then, we can quite naturally encode a finite set \(K\subsetneq \mathbb{N}\) of natural numbers like the following (\(c\) is the coding function): \(c(K)=\prod_{i\in K} p(i)\). I.e., we encode the elements of the set via prime factorization.

We can reconstruct the set \(K\) from a number by taking it's prime factorization and applying \(p^{-1}\) to each prime factor.

Note that the above does not preserve the order of the elements (multiplication is commutative), since we are considering sets. But what happens, if we want to encode ordered tuples this way?

So, let us now consider a tuple \(T=(t_1, t_2, \dots, t_n) \in \mathbb{N}^n\), that we want to encode into a number. For now, we exclude 0 from the natural numbers, but the idea can be easily extended to work with 0 as well.

For this, we can derive another encoding from the first. The problem is that - since multiplication is associative - we can not map the generated primes to the elements in \(T\), since we therefore must somehow know the position of the element.

We can circumvent this problem as follows: Starting with \(t_1\), we pick the prime \(p(t_1)\). We continue with \(t_2\), but we can not simply choose \(p(t_2)\), since then \((t_1, t_2)\) and \((t_2, t_1)\) would be mapped to the same number.

So, we do the following: For \(t_2\), we start counting primes at \(p(t_1)\) and thus, pick the number \(p(t_1 + t_2)\). Similarily, for \(t_3\) we pick \(p(t_1 + t_2 + t_3)\).

The rule: For \(t_i\), we choose \(p(t_1 + t_2 + \dots + t_i) = p(\sum_{j=1}^i t_j)\). Thus, we get the encoding function \(c(T) = \prod_{i=1}^n p(\sum_{j=1}^i t_j)\).

That is, we ensure that the prime numbers are strictly increasing. If we want to reconstruct the tuple \(T\) given a number, we take the prime factorization, sort the prime numbers, and apply \(p^{-1}\) to each prime. Now, we compute the distances between two consecutive elements, and get the elements from the tuple.


I don't offer a proof, but an example to see how it's done.

Take \(T=(2,5,4,7)\). To encode this, we have to compute \(c(2,5,4,7) = p(2) * p(2+5) * p(2+5+4) * p(2+5+4+7)\), which can be evaulated to \(p(2) * p(7) * p(11) * p(18)\).

This yields \(3*17*31*61=99603\). So, we encode the tuple \((2,5,4,7)\) as 99603.

If we want to recompute the tuple from the given number 99603, we do the following: We prime-factorize: 99603 = 3 * 17 * 31 * 61. Thus, we know that the primes 3, 17, 31 and 61 were used. Now, we compute \(p^{-1}\) for each of them and get: \(p^{-1}(3)=2\), \(p^{-1}(17)=7\), \(p^{-1}(31)=11\), \(p^{-1}(61)=18\).

So, now we consider the numbers 2, 7, 11, 18. We now can conclude, that the first number must have been 2, the second number must have been 7-2=5, while the third number must have been 11-7=4 and the last number must have been 18-11=7. This yields the tuple (2,5,4,7).


The presented encoding is (probably) of no practical relevance due to the following:

However, the presented encoding reveals a very nice result: It is clear that the above construction can be applied to any tuple of natural numbers and always yields a natural number. Moreover, we can factorize every natural number, and, thus, generate a tuple from every natural number.

We didn't show this, but it is evident that no two sequences yield the same number. Simply assume that any two tuples would yield the same number: This would immediately imply that both tuples must have the same number of elements. But if the elements are distinct, this will directly lead to different primes and another resulting number.

So, this means, that each tuple has a natural number (greater that 1, since 1 is no prime), and each natural number corresponds to exactly one tuple. That means, that in essence, we have as many tuples as natural numbers, since we generated a bijection (aside from the number 1). Thus, the tuples of natural numbers are enumberable: We simply start counting 2, 3, 4, 5, 6, 7, ... and compute the corresponding tuple for each number.