The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Statistically Accurate Simulation

For an application I'm writing, I'd like to simulate the process of choosing a random sample of data from a population whose data members conform to a normal distribution, but I have no idea how to accomplish that.

In other words, imagine that I have a population (of unknown size) whose mean value is 10 and whose standard deviation is 2. I'd like to construct that population like this...

  Population p = new Population(10.0, 2.0);

And then I'd like to retrieve randomly generated members of the population, kind of like this...

  float m = p.getRandomMember();

If I were to call this method a hundred million times (or some arbitrarily large number of method-calls), I'd like the distribution of values that it returns to be normally distributed about the specified mean, with a distribution density defined by the standard deviation.

And yet, for each individual method call, there should be no hard-and-fast lower or upper limit on the random values returned, so long as the outliers are (of course) statistically rare.

Does anybody know of a library that provides that kind of functionality? Obviously, I'm using Java, but I'd be willing to take a look at examples in C or VB or Perl or any other language really. And I'd really prefer an open source library so that I can inspect the algorithm and see how it works.

I've considered just using a look-up table to transform an ordinary sequence of random numbers into a normally-distributed set, but I'm not crazy about the lack of accuracy that a look-up table would provide. I'd rather use a library that was mathematically accurate to the highest degree of precision.

Any suggestions?
Benji Smith Send private email
Tuesday, February 01, 2005
 
 
Just in case anyone would benefit from more (probably redundant) details, if I were to run this code...

  Population p = new Population(10.0, 2.0);
  for (int i = 0; i < 100000000; i++) {
      float m = p.getRandomMember();
      System.out.println(m);
  }

...I should be able to expect that about 68.3% (or roughly 68,300,000) of the values would be between 8.0 and 12.0 (one standard deviation from the mean), 95.4% should be between 6.0 and 14.0, 99.7% should be between 4.0 and 16.0, etc, etc, etc, even out to five or ten standard deviations from the mean.
Benji Smith Send private email
Tuesday, February 01, 2005
 
 
Google found me this ... http://www.devx.com/vb2themax/Tip/19233
David Aldridge Send private email
Tuesday, February 01, 2005
 
 
Thanks, David, that's a good start.

But the code that you pointed me to is actually only accurate for integral multiples of the standard deviation. It'll be accurate at exactly 1 standard devation from the mean, or 2 standard deviations of the mean, but it won't be correct at a distance of 2.5 standard deviations from the mean.

The code in the example you found makes the assumption that distribution ratios are proportional between whole-number standard deviation multiples, and that's not really the case. The normal distribution is *curved*.

Here's his faulty explanation:

"For example, if I wanted to simulate the heights of children in a class I might postulate that their heights would have mean=x and standard deviation=s. I might decide at a value for s by suggesting that 95% of them should have heights between x-2s and x+2s. It follows that we can simulate the height of any one member of the (imaginary) class of children = (s*NormRand + x)."
Benji Smith Send private email
Tuesday, February 01, 2005
 
 
Actually, upon closer investigation, I was mistaken. That code is actually quite accurate. I was mislead by misreading this: "(s*NormRand + x)" and jumping to the conclusion that the algorithm assumed a proportional relationship between integral multiples of the standard deviation.

Here's a more general explanation of the "Polar Method" alrorithm (which I was unaware of until now), in case anyone is interested:

http://psweb.sbs.ohio-state.edu/faculty/rtimpone/computer_resources/polar.htm

And here's an implementation in C:

http://finance.bi.no/~bernt/gcc_prog/recipes/recipes/node23.html
Benji Smith Send private email
Tuesday, February 01, 2005
 
 
Aha!

And here's a Java implementation (classname: NormalPolarGen)

http://www.iro.umontreal.ca/~lecuyer/ssj/dist/

...with great documentation...

http://www.iro.umontreal.ca/~lecuyer/ssj/dist/doc/html/index.html

It's amazing how much easier it is to find these things now that I know the name of the "Polar Method".

Thanks!!
Benji Smith Send private email
Tuesday, February 01, 2005
 
 
The polar method is version of the Box-Muller transformation. Wikipedia has a overview: http://en.wikipedia.org/wiki/Box-Muller_transform
scruffie Send private email
Tuesday, February 01, 2005
 
 
Benji, since you're helping me ont he other page with my design, I thought I chime in here.

Wouldn't you really be looking for a Gaussian Distribution generator instead of Normal?  Let's assume very small numbers so I can demostrate my point.

Since a population with a mean of 5 has a lower bound at 0 and an limitless upper bound, this might be a random sample (0, 8, 10, 2, 1, 14, 9) - I know that doesn't avg right to 5 but you get my point i think.

For an avg of 5, for each 0 there must be a 10, for each 1, there should be a 9.  But, there will be many more 0's, 1's, 2's, 3's, and 4's than 5 through 10's (because there will even be some 11 through unlimited upper.

Come to think of it, Maybe this doesn't apply to you since you're onlyworking with huge population numbers that never reach 0 (but, you did say limitless lower bound which is not really true).
The Top 20%
Wednesday, February 02, 2005
 
 
Actually, for this particular application, I really am thinking about a normal distribution, since negative numbers still make sense in the context within which I'm working. But I could see how a gaussian distribution would be helpful for lots of applications. I had heard of gaussian distributions before, but I never really knew what they were. Thanks for filling me in.
Benji Smith Send private email
Wednesday, February 02, 2005
 
 
Yea, Gaussian would be suited to simulating random scores of soccer games since in most games a team will score 0, 1, or 2 or 3, but once in a while a 5 or 6 or 8, but all while keeping the avg around 1.5 (depending on the league).
The Top 20%
Wednesday, February 02, 2005
 
 
Try here for information:

http://mathworld.wolfram.com/

Knuth also has a section on random numbers.

And you can find Numerical Recipes in C/F77/F90 online.
el
Wednesday, February 02, 2005
 
 
If you want to generate Gaussian random error in Excel, here is a link to the macro you need:
 http://graphpad.com/faq/viewfaq.cfm?faq=966
Harvey Motulsky Send private email
Wednesday, February 02, 2005
 
 
The Top 20%: you're thinking of a Bernoulli distribution, not Gaussian. A Gaussian distribution is the same thing as a normal distribution.
scruffie Send private email
Wednesday, February 02, 2005
 
 
More probability distributions than you can shake a stick at (if you're the type of person who shakes sticks at probability distributions):

http://en.wikipedia.org/wiki/Probability_distribution
Benji Smith Send private email
Wednesday, February 02, 2005
 
 
Ask a recent comp eng or EE grad. Generating random numbers to an arbitrary distribution is a pretty rudimentary topic in any random processes or numerical methods course. Alternatively, check your local university library for textbooks on those two topics.

Google helps too, search for:
random number generation gaussian

I'd help you out but my purely academic knowledge on this subject is a bit rusty and I don't have my old textbooks handy.
A. Gorilla
Friday, February 04, 2005
 
 
A. Gorilla said:
> I'd help you out but my purely academic knowledge on
> this subject is a bit rusty and I don't have my old
> textbooks handy.

That's okay. I now have more than enough information to proceed on my own. :)
Benji Smith Send private email
Friday, February 04, 2005
 
 
My goodness! Don't write this yourself!

Use the COLT library from CERN.
http://hoschek.home.cern.ch/hoschek/colt/index.htm

Code:
Normal population = new Normal(mean, deviation, new MersenneTwister(seed));
double sample = population.nextDouble();

ps:
There is no difference between normal and gaussian distribution. It's just two names for the same thing.

Take a tour through mathworld on the link above, it's a really nice site.
Shadow
Friday, February 04, 2005
 
 
Sorry for pointing out the distribution thing again, I seem to have missed scruffies post.

From the text it seems to me that the poisson distribution was originally meant by the poster, not the bernoulli distribution. The poisson distribution is unlimited above while the bernoully distribution is limited by 1.
Shadow
Saturday, February 05, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz