ooblick.com
Insufficiently-advanced technology

Sampling Errors Using Different Sample Sizes

The purpose of this exercise is to see how the sample size affects the results one gets.

I wrote a simple script that simulates a melt consisting of 1,000,000,000 (one billion) atoms of D, and 1,000,000,000 (one billion) atoms of Di. It then draws 100 samples ("crystals") from this virtual melt. It also picks a random value for P, but this isn't important at the moment. What matters is how much the points differ from the expected value of 1.00 (that is, a 1:1 ratio of D to Di).

The script does keep track of individual atoms: if the first atom it chooses is an atom of D, then for the second atom it will choose from a "melt" consisting of 999,999,999 atoms of D and 1,000,000,000 atoms of Di.

Plot of P/Di (horizontally) vs D/Di (vertically). The different colors represent different sample sizes.

This plot represents four runs of the script: the red points represent "crystals" of 100 atoms each; the green points have 1,000 atoms each, the blue points have 10,000 atoms each, and the purple points have 100,000 atoms each.

Again, the horizontal position of the points is unimportant at this point.

Notice that the red points are scattered rather widely around the expected value, from about 0.6 to almost 1.6. As the sample size gets larger and larger, fewer and fewer samples differ significantly from the expected value. At 100,000 atoms per sample, the "crystals" are all quite close to the expected value.

This is not the result of measurement error, since the script can accurately count each atom it is dealing with. The only thing this plot demonstrates is a statistical effect: the larger the sample size, the more likely that the crystal will have a D/Di ratio close to the expected value.

In real-life isochron dating, the smallest sample will be at least ten million times larger than any of the simulated crystals above, so we can expect the data to be even more tightly clustered than in the graph above.


The script used to generate these data sets can be found here.