johnramey

Don't think. Compute.

Pseudo-Random vs. Random Numbers in R

with 2 comments

Happy Thanksgiving, everyone. Earlier today, I found an interesting post from Bo Allen on pseudo-random vs random numbers, where the author uses a simple bitmap (heat map) to show that the rand function in PHP has a systematic pattern and compares these to truly random numbers obtained from random.org. The post’s results suggest that pseudo-randomness in PHP is faulty and, in general, should not be underestimated in practice. Of course, the findings should not be too surprising, as there is a large body of literature on the subtleties, philosophies, and implications of the pseudo aspect of the most common approaches to random number generation. However, it is silly that PHP‘s random number generator (RNG) displays such an obvious pattern nowadays because there are several decent, well-studied pseudo-RNG algorithms available as well as numerous tests for randomness.  For a good introduction to RNG, I recommend John D. Cook’s discussion on testing a random number generator.

Now, I would never use PHP for any (serious) statistical analysis, partly due to my fondness for R, nor do I doubt the practicality of the RNG in R. But I was curious to see what would happen. So, created equivalent plots in R to see if a rand equivalent would exhibit a systematic pattern like in PHP, even if less severe. Also, for comparison, I chose to use the random package, from Dirk Eddelbuettel, to draw truly random numbers from random.org. Until today, I had only heard of the random package but had never used it.

I have provided the function rand_bit_matrix, which requires the number of rows and columns to display in the plotted bitmap. To create the bitmaps, I used the pixmap package rather than the much-loved ggplot2 package, simply because of how easy it was for me to create the plots. (If you are concerned that I have lost the faith, please note that I am aware of the awesomeness of ggplot2 and its ability to create heat maps.)

It is important to note that there were two challenges that I encountered when using drawing truly random numbers.

  1. Only 10,000 numbers can be drawn at once from random.org. (This is denoted as max_n_random.org in the function below.)
  2. There is a daily limit to the number of times the random.org service will provide numbers.

To overcome the first challenge, I split the total number of bits into separate calls, if necessary. This approach, however, increases our number of requests, and after too many requests, you will see the error: random.org suggests to wait until tomorrow. Currently, I do not know the exact number of allowed requests or if the amount of requested random numbers is a factor, but looking back, I would guess about 20ish large requests is too much.

Below, I have plotted 500 x 500 bitmaps based on the random bits from both of R and random.org. As far as I can tell, no apparent patterns are visible in either plot, but from the graphics alone, our conclusions are limited to ruling out obvious systematic patterns, which were exhibited from the PHP code. I am unsure if the PHP folks formally tested their RNG algorithms for randomness, but even if they did, the code in both R and PHP is straightforward and provides a quick eyeball test. Armed with similar plots alone, the PHP devs could have sought for better RNG algorithms — perhaps, borrowed those from R.

Click the images for a larger view.

 

 

Finally, I have provided my R code, which you may tinker with, copy, manipulate, utilize, etc. without legal action and without fear of death by snoo snoo. Mmmm. Snoo snoo.

 

Did you like this? Share it:

Written by ramhiser

November 25th, 2011 at 3:08 am

Posted in r

Tagged with , ,

  • http://www.r-statistics.com Tal Galili

    Great post John, thank you.

    p.s: consider adding the “subscribe to comments” plugin :)

    • http://www.ramhiser.com John Ramey

      Tal,

      Yeah, thanks for the recommendation. At one point, I had signed up for DISQUS, but I had disabled it. Why? No idea.