A little thougt about how to encode tuples of natural numbers with prime numbers
by phimuemue
I have always been interested in how to encode certain things into simple natural numbers. This post shows a way to encode sets and tuples of natural numbers, that I personally find quite nice.
Let be the i-th prime (i.e. p(1)=2, p(2)=3, p(3)=5, and so on), let
be the inverse of
. Then, we can quite naturally encode a finite set
of natural numbers like the following (
is the coding function):
. I.e., we encode the elements of the set via prime factorization.
We can reconstruct the set from a number by taking it’s prime factorization and applying
to each prime factor.
Note that the above does not preserve the order of the elements (multiplication is commutative), since we are considering sets. But what happens, if we want to encode ordered tuples this way?
So, let us now consider a tuple , that we want to encode into a number. For now, we exclude 0 from the natural numbers, but the idea can be easily extended to work with 0 as well.
For this, we can derive another encoding from the first. The problem is that – since multiplication is associative – we can not map the generated primes to the elements in , since we therefore must somehow know the position of the element.
We can circumvent this problem as follows: Starting with , we pick the prime
. We continue with
, but we can not simply choose
, since then
and
would be mapped to the same number.
So, we do the following: For , we start counting primes at
and thus, pick the number
. Similarily, for
we pick
.
The rule: For , we choose
. Thus, we get the encoding function
.
That is, we ensure that the prime numbers are strictly increasing. If we want to reconstruct the tuple given a number, we take the prime factorization, sort the prime numbers, and apply
to each prime. Now, we compute the distances between two consecutive elements, and get the elements from the tuple.
Example
I don’t offer a proof, but an example to see how it’s done.
Take . To encode this, we have to compute
, which can be evaulated to
.
This yields . So, we encode the tuple
as 99603.
If we want to recompute the tuple from the given number 99603, we do the following: We prime-factorize: 99603 = 3 * 17 * 31 * 61. Thus, we know that the primes 3, 17, 31 and 61 were used. Now, we compute for each of them and get:
,
,
,
.
So, now we consider the numbers 2, 7, 11, 18. We now can conclude, that the first number must have been 2, the second number must have been 7-2=5, while the third number must have been 11-7=4 and the last number must have been 18-11=7. This yields the tuple (2,5,4,7).
Remarks
The presented encoding is (probably) of no practical relevance due to the following:
- The size of the prime numbers (not to speak about the product of them) grows rapidly. It is not possible to efficiently store numbers generated by tuples containing 100 elements
- Decomposing a number into its prime factors seems to be rather computationally expensive. It is known to be in NP and coNP, but it is suspected to lie outside of P.
However, the presented encoding reveals a very nice result: It is clear that the above construction can be applied to any tuple of natural numbers and always yields a natural number. Moreover, we can factorize every natural number, and, thus, generate a tuple from every natural number.
We didn’t show this, but it is evident that no two sequences yield the same number. Simply assume that any two tuples would yield the same number: This would immediately imply that both tuples must have the same number of elements. But if the elements are distinct, this will directly lead to different primes and another resulting number.
So, this means, that each tuple has a natural number (greater that 1, since 1 is no prime), and each natural number corresponds to exactly one tuple. That means, that in essence, we have as many tuples as natural numbers, since we generated a bijection (aside from the number 1). Thus, the tuples of natural numbers are enumberable: We simply start counting 2, 3, 4, 5, 6, 7, … and compute the corresponding tuple for each number.