# How to Generate Pure Random Numbers

A pseudo random number generator is a deterministic algorithm for producing a series of numbers that have many of the same properties as a series of purely random numbers. The sequences can mirror true random numbers in frequency distribution, clustering of runs, and the absence of readily discernible patterns. However, unlike a pure sequence, a pseudo-random sequence is 100% predictable when its algorithm and seed are known. This of is consequence only in applications where the existence of a pattern produces a result dissimilar from that of a purely random sequence.

A good pseudo random sequence often works fine for applications such as statistical sampling or simulation models. In these applications, the generator need be of only moderate sophistication so that its inherent patterns do not resonate with those occurring in nature. The opposite is true in applications where the sequence is required to be unpredictable to an intelligent observer of previous elements of the sequence. For example, an electronic slot machine needs a sequence that is unpredictable to the left; the application would fail if the gambler determine the next spin based on a pattern analysis of previous spins. Unpredictability to the right is not required since it makes no difference if the gambler could infer an earlier sequence from a later sequence. Encryption is an application requiring unpredictability to the left and right; knowledge of a single declassified document should not enable a cryptanalyst to crack earlier or later messages.

Real random numbers are non-deterministic. A generator needs to collect data from natural stochastic processes. I have an electronic geiger counter which generates a pulse every time it detects a radioactive decay. The time between decays has a strong pure random component. To produce a usable pure random sequence, latent patterns must be removed. The known exponential distribution is easily removed mathematically; however, the mean time between decays in background radiation rises in the daytime and falls at night. There may also be other non-random patterns lying hidden in the data which could potentially be exploited by an analyst attempting to reduce his uncertainty for the next number in the sequence.

Underlying patterns plague almost any source of natural randomness. Each of the examples below contain a purely unpredictable component and some analyzable patterns. To the extent the patterns are known, a method exists to segregate the pattern component from the randomness. This is the principal function of data compression tools which achieve the goals by summarizing data according to its pattern and to deviations from the pattern, the information component. Unfortunately, there is always a risk that a sophisticated analyst make be able to isolate even more patterns than the compression tool was able to identify. Encryption tools are essentially sophisticated random number generators with patterns so sophisticated that they are difficult to analyze. Even simple encryption functions mask the patterns so well that data compression tools cannot isolate them.

Sources of partially random data include:

• timing of radioactive decays (serial port geiger counters cost less than \$100) Aware Electronics
• recording background noise into .wav files
• live video files
• session logs from chat lines
• keystroke latencies for typed text

Partially random sequences can be purified to purely random sequences if the underlying randomness is diffused equally throughout the source (I will soon provide the source code for a tool that does this to an optimal level). After diffusion, a portion of the file should be cut-away so that the process cannot reversed.

The steps for building purely random numbers:

1. gather source data remove known patterns by using lossless data compression
2. encrypt the data
3. apply a diffusion tool
4. cut-away a portion of the file