Random number generators have many applications. They can be used to introduce a part of “luck” in a game (for example drawing cards in poker) but also for other tasks such as Monte Carlo simulations (drawing multiple “random” samples).
In this first part, we will introduce a very well known pseudo-random number generator, the linear congruential generator.
Linear Congruential Generator
A linear congruential generator (or LCG) is one of the simplest type of pseudo-random number generator. At each step, a new pseudo-randomized number is obtained by applying a linear function \(f(x) = ax + b\) to the previous pseudo-randomized number, and the result is taken up to a modulus \(m\).
The sequence of pseudo-randomized numbers is therefore entirely defined by four parameters:
- the modulus \(m\),
- the two coefficients \(a\) and \(b\) of the linear function \(f\),
- the seed \(x_0\),
and the \(k^{th}\) pseudo-randomized number \(x_{k}\) is computed as follow: \[ \begin{equation} \begin{split} x_{k} &= f(x_{k-1}) \text{ mod } m \\ &= (ax_{k-1} + b) \text{ mod } m. \end{split} \end{equation} \]
Since the computation is taken up to a modulus \(m\), the generated values are between 0 and \(m-1\), both bounds included.
Let’s write a simple linear congruential generator in Julia:
function build_lcg(m, a, b, seed)
f = x -> a * x + b
state = seed
function get_next_number()
state = f(state) % m
end
end
build_lcg (generic function with 1 method)
We are now going to generate a sequence of numbers with the parameters \(m = 16\), \(a = 5\), \(b = 7\), and \(0\) for the seed value.
get_next_number = build_lcg(16, 5, 7, 0)
get_next_number (generic function with 1 method)
print([get_next_number() for _ in 1:8])
[7, 10, 9, 4, 11, 14, 13, 8]
Ok, that looks like a “random” sequence of numbers \(\le 15\). Let’s try again:
print([get_next_number() for _ in 1:8])
[15, 2, 1, 12, 3, 6, 5, 0]
That still looks fine. Another run:
print([get_next_number() for _ in 1:8])
[7, 10, 9, 4, 11, 14, 13, 8]
And we are back to exactly the same sequence than the first one! It doesn’t look random at all anymore!
This result is actually normal since the state of the generator is equal to the latest generated number. There are only 16 possible numbers in the range \(0 \dots 15\), and therefore only 16 different states are possible. Once we are back to a previous state, the sequence of generated numbers has to enter a cycle. We therefore usually want \(m\) to be big.
Let’s try again with different values of \(a\) and \(b\), for example \(a = 6\) and \(b = 11\):
get_next_number = build_lcg(16, 6, 11, 0)
get_next_number (generic function with 1 method)
print([get_next_number() for _ in 1:8])
[11, 13, 9, 1, 1, 1, 1, 1]
The result is really bad: the generator is stuck at \(1\) after \(4\) steps. It seems that the choice of \(a\) and \(b\) is very important. But how could we choose \(a\) and \(b\)?
Luckily for us, it is a well known problem, and the following theorem tells us what to do:
Hull–Dobell Theorem: The period of the generator is of maximal length \(m\) if and only if
\(b\) and \(m\) are coprime,
\(a - 1\) is a multiple of all prime factors of \(m\),
\(a - 1\) is a multiple of 4 if \(m\) is a multiple of 4.
We saw previously that a generator with paramaters \(m = 16\), \(a = 5\) and \(b = 7\) has a maximal period of length \(16\). From the previous theorem, since \(m = 4 \times 4\) is a multiple of \(4\), \(a - 1 = 4\) has to be a multiple of \(4\). This is indeed true. Furthermore \(a - 1 = 4\) is also a multiple of all prime factors of \(16\), the only factor being \(2\). Finally, \(16\) and \(7\) are coprime, as expected.
When we work with 64-bit integers, we are particularly interested in the case \(m = 2^{64}\). To have a maximal period, we should take \(a = 4 k + 1\) for a \(k \ge 1\) and \(b\) should be an odd number since \(2\) is the only prime factor of \(m\). The generator will then be able to output all the possible values of 64-bit integers.
When the parameters are chosen to get a period of maximal length, the generator will produce each value exactly once during the first \(m\) iterations, and will then repeat the previous sequence exactly in the same order. By taking into account only the first \(m\) values, we can consider that such a generator is doing a “random” permutation (or shuffle) of the interval \(0 \dots m-1\).
But what if we want to simulate rolling a die multiple times?
We should be able to get a sequence such as \(5, 2, 3, 5, 5, 4, 3, 1, \dots\)
But we will never obtain this sequence by taking \(m = 6\) and adding \(1\) to the generated values. Indeed, even if we choose parameters to get a maximal period, after outputting 6 values, the generator will start to cycle. Furthermore, none of the values can appear multiple times during the 6 first outputs.
Could we solve this problem in a simple way?
Let’s take \(m = 16\) and apply a modulo 6 to the output of the generator:
get_next_number = build_lcg(16, 13, 7, 1)
roll_die = () -> get_next_number() % 6 + 1
#14 (generic function with 1 method)
Let’s roll a die sixteen times. We know from the Hull-Dobell theorem that we reached the end of the period, and that if we roll the die again, we will get the same results again.
print([roll_die() for _ in 1:16])
[5, 6, 1, 6, 3, 4, 5, 4, 1, 4, 3, 2, 1, 2, 3, 2]
This sequence does not look too bad. Let’s now try rolling a four-sided die. We just have to change the second modulus to 4:
roll_d4 = () -> get_next_number() % 4 + 1
#18 (generic function with 1 method)
print([roll_d4() for _ in 1:16])
[1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2]
This sequence looks really bad. Even when we choose the parameters so that the LCG has maximal period, if we next apply a modulo to the generated numbers, we can obtain a terribly bad sequence. Let’s change the LCG modulus to 18:
get_next_number = build_lcg(18, 7, 7, 1)
roll_d4 = () -> get_next_number() % 4 + 1
#22 (generic function with 1 method)
print([roll_d4() for _ in 1:18])
[3, 4, 1, 2, 1, 4, 3, 4, 3, 2, 3, 2, 1, 2, 1, 4, 1, 2]
This sequence looks better.
Linear congruential generators actually have many defects. They should be avoided whenever possible. We will show some of these defects in Part 2 of this serie.