Carrying on from where we left off last week, I am going to look at generating runs of a given sample size and compare their means and 95% confidence intervals.

Comparing Repeated Samples of the Same Size

I have seen the following stated in a variety of ways. But the following was certainly the most straight forward.

I started by building an experiment to get N samples of size X. And to plot that on a chart. Something like the following for 10 samples of size 33.

bar chart showing mean and stderr for 10 samples of 33 countries each

The line at the top of each bar shows the 95% confidence interval for that sample. The red line is the actual world average age for the year in question. The green coloured area covers the 95% confidence interval for the the first sample (see next paragraph).

That coloured area is rather speculative. It was just something I thought might give me some idea of how confidence intervals work. But, I have since decided it adds no real value to the experiment or discussion. But, I kinda like the look of it being there, so have left it in.

Here’s another chart for 22 samples of size 33.

bar chart showing mean and stderr for 22 samples of 33 countries each

Then I decided to run this process multiple times and tabulate the results. So, I did 10 runs of 22 samples of size 33. See table below. My code saved the data to a file. Then I wrote another module to read the data and generate the HTML table for inclusion in the post. I did not plot each run. So, sorry, no pictures.

Sample means +- std err for 22 samples of size 33

 Run 1Run 2Run 3Run 4Run 5Run 6Run 7Run 8Run 9Run 10
Sample 129.9 ±3.325.3 ±2.529.2 ±3.128.2 ±3.128.7 ±3.027.4 ±3.027.7 ±3.226.6 ±3.526.9 ±2.927.6 ±2.7
Sample 232.3 ±3.227.7 ±3.127.9 ±2.828.0 ±3.128.4 ±3.229.6 ±3.130.5 ±3.226.6 ±2.927.9 ±3.229.7 ±3.1
Sample 326.9 ±3.030.1 ±2.729.3 ±3.227.5 ±2.928.6 ±3.228.9 ±3.028.2 ±3.128.4 ±3.126.7 ±3.229.7 ±3.1
Sample 428.2 ±3.328.3 ±3.226.6 ±3.228.3 ±3.130.6 ±3.128.9 ±3.228.0 ±3.129.3 ±3.027.5 ±2.830.0 ±2.6
Sample 528.2 ±2.928.1 ±3.027.7 ±3.031.5 ±2.826.7 ±2.829.3 ±3.128.7 ±3.127.8 ±3.227.9 ±3.229.4 ±2.9
Sample 631.3 ±3.125.9 ±3.025.9 ±2.527.5 ±2.928.2 ±3.030.3 ±2.829.3 ±2.726.4 ±3.128.9 ±3.127.8 ±3.2
Sample 727.5 ±3.128.2 ±2.728.0 ±2.427.6 ±2.926.9 ±2.729.1 ±2.927.6 ±3.327.7 ±3.126.1 ±3.228.5 ±3.1
Sample 828.4 ±3.128.6 ±3.029.0 ±2.827.8 ±3.123.9 ±2.726.7 ±3.228.9 ±3.625.6 ±2.928.7 ±2.928.2 ±3.0
Sample 928.2 ±3.025.3 ±2.830.8 ±3.229.9 ±3.128.9 ±3.027.7 ±3.027.6 ±3.330.1 ±3.030.9 ±3.127.9 ±3.5
Sample 1029.1 ±3.029.3 ±3.330.7 ±3.326.5 ±3.326.2 ±2.828.3 ±2.927.0 ±2.928.4 ±3.228.2 ±2.930.3 ±3.1
Sample 1128.8 ±3.427.6 ±3.527.9 ±2.928.8 ±3.129.7 ±2.926.7 ±3.025.6 ±2.430.1 ±3.127.2 ±2.827.3 ±3.0
Sample 1227.0 ±3.429.3 ±3.130.0 ±3.128.5 ±2.927.1 ±2.730.2 ±3.228.6 ±2.926.8 ±3.130.0 ±3.028.1 ±3.1
Sample 1326.0 ±2.828.6 ±2.927.8 ±2.626.9 ±3.028.6 ±2.829.5 ±3.128.3 ±2.828.5 ±3.029.8 ±2.929.6 ±3.3
Sample 1429.1 ±3.527.0 ±3.131.1 ±3.027.3 ±2.926.0 ±2.826.9 ±2.828.5 ±2.929.1 ±2.927.5 ±3.228.3 ±3.3
Sample 1527.9 ±3.127.9 ±2.927.7 ±2.826.8 ±3.129.2 ±2.728.8 ±3.228.4 ±3.327.2 ±3.028.4 ±2.829.3 ±3.5
Sample 1628.1 ±2.929.1 ±3.027.6 ±2.829.1 ±3.129.0 ±3.328.9 ±3.026.4 ±2.826.4 ±3.031.3 ±3.027.3 ±2.9
Sample 1730.3 ±3.327.7 ±3.230.8 ±3.228.5 ±3.329.7 ±3.228.0 ±3.129.0 ±3.027.3 ±3.129.4 ±3.129.9 ±3.1
Sample 1827.6 ±2.826.6 ±3.029.2 ±2.929.9 ±3.029.6 ±3.126.8 ±2.930.6 ±2.929.6 ±3.430.6 ±3.430.1 ±2.9
Sample 1930.0 ±2.926.5 ±3.027.6 ±2.928.5 ±3.228.0 ±2.928.0 ±3.528.4 ±2.827.9 ±3.230.6 ±3.328.0 ±2.8
Sample 2027.7 ±2.829.7 ±2.927.8 ±2.827.9 ±2.630.6 ±3.427.6 ±2.526.6 ±3.028.2 ±3.227.1 ±2.726.7 ±3.1
Sample 2126.3 ±2.828.0 ±3.028.5 ±3.127.4 ±2.928.4 ±3.127.7 ±3.230.2 ±2.628.2 ±2.727.6 ±2.730.2 ±3.1
Sample 2227.6 ±3.130.3 ±3.128.4 ±2.728.0 ±3.531.1 ±3.129.2 ±2.928.6 ±3.127.8 ±3.328.8 ±3.029.6 ±2.9
Run Mean28.528.028.628.228.428.428.327.928.628.8
True Missed1211101100

So for 220 samples, the sample 95% confidence interval did not contain the true population mean 8 times, or roughly 3.6% of the time. Which agrees with the statement above. But I also had one run where the real average was not in the sample 95% confidence interval 2 times. That’s 9% of the time for that sample. Seems quite high. But is it statistically improbable? Unfortunately, I don’t know. But sure doesn’t look like it is. Though that could be a flaw in my sampling process.

Histogram of All the Above Samples

Well, in the end I decided to plot a histogram of the 220 samples. You know, just cuz.

histogram showing means of the 220 samples above

It would appear that taking multiple samples of a given size generates something that looks like a normal distribution. And the mean of those samples approaches the population average. We look a bit more at that in the next or a future post.

My Code for this Experiment

Unfortunately the code in the module, est_avg_age.py is starting to get rather messy. Sorry!

And rather than include it here, I think I will add a separate post showing all the current code in that module. But, in est_avg_age.py I did:

  1. As mentioned previously, add a new function, all_stat_sng_sample(), to produce the stats needed by experiments 6 & 7.
  • Add a new variable, P_PATH, to hold the path to my play directory (where experiment #7 writes the data for further use).
  • Modify experiment 7 to write all its sample data to a file that I could later use to generate HTML tables and plots for related blog posts (e.g. this one).
  • Add new variables and related optional command line arguments to allowing me to modify what experiment does (e7_f_nbr, t7_rpt, t7_size, t7_runs, etc.).
  • Add variable t7_seed and modify code to use it to control the random seeding for the functions randomly selecting the data for our samples, all_stat_sng_sample() and get_single_sample(). Modify those functions accordingly. I wanted to be able to repeat the experiment starting from the same place each time. You many have noticed that Run 1 above contains the same data as the chart showing the 22 samples of size 33.
  • Modify the code that prints experiment output to the command line.
  • Add a new experiment, #8, which I likely will not use; so will be deleted as some point in the future.

I also, for my own personal needs, created a new module, population/play/t7_build_table.py. This is the module that processes the data in the file(s) produced by experiment #7 to generate an html table and/or histogram of the sample data for this blog post. See above. Its code will also be in the separate post/page.

Please see, Module Code: est_avg_age.py for Experiment 6 & 7.

Done for This One

And, with that I think I will call it a day.

Resources