Carrying on from where we left off last week, I am going to look at generating runs of a given sample size and compare their means and 95% confidence intervals.

Comparing Repeated Samples of the Same Size

I have seen the following stated in a variety of ways. But the following was certainly the most straight forward.

I started by building an experiment to get N samples of size X. And to plot that on a chart. Something like the following for 10 samples of size 33.

bar chart showing mean and stderr for 10 samples of 33 countries each

The line at the top of each bar shows the 95% confidence interval for that sample. The red line is the actual world average age for the year in question. The green coloured area covers the 95% confidence interval for the the first sample (see next paragraph).

That coloured area is rather speculative. It was just something I thought might give me some idea of how confidence intervals work. But, I have since decided it adds no real value to the experiment or discussion. But, I kinda like the look of it being there, so have left it in.

Here’s another chart for 22 samples of size 33.

bar chart showing mean and stderr for 22 samples of 33 countries each

Then I decided to run this process multiple times and tabulate the results. So, I did 10 runs of 22 samples of size 33. See table below. My code saved the data to a file. Then I wrote another module to read the data and generate the HTML table for inclusion in the post. I did not plot each run. So, sorry, no pictures.

Sample means +- std err for 22 samples of size 33

	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10
Sample 1	29.9 ±3.3	25.3 ±2.5	29.2 ±3.1	28.2 ±3.1	28.7 ±3.0	27.4 ±3.0	27.7 ±3.2	26.6 ±3.5	26.9 ±2.9	27.6 ±2.7
Sample 2	32.3 ±3.2	27.7 ±3.1	27.9 ±2.8	28.0 ±3.1	28.4 ±3.2	29.6 ±3.1	30.5 ±3.2	26.6 ±2.9	27.9 ±3.2	29.7 ±3.1
Sample 3	26.9 ±3.0	30.1 ±2.7	29.3 ±3.2	27.5 ±2.9	28.6 ±3.2	28.9 ±3.0	28.2 ±3.1	28.4 ±3.1	26.7 ±3.2	29.7 ±3.1
Sample 4	28.2 ±3.3	28.3 ±3.2	26.6 ±3.2	28.3 ±3.1	30.6 ±3.1	28.9 ±3.2	28.0 ±3.1	29.3 ±3.0	27.5 ±2.8	30.0 ±2.6
Sample 5	28.2 ±2.9	28.1 ±3.0	27.7 ±3.0	31.5 ±2.8	26.7 ±2.8	29.3 ±3.1	28.7 ±3.1	27.8 ±3.2	27.9 ±3.2	29.4 ±2.9
Sample 6	31.3 ±3.1	25.9 ±3.0	25.9 ±2.5	27.5 ±2.9	28.2 ±3.0	30.3 ±2.8	29.3 ±2.7	26.4 ±3.1	28.9 ±3.1	27.8 ±3.2
Sample 7	27.5 ±3.1	28.2 ±2.7	28.0 ±2.4	27.6 ±2.9	26.9 ±2.7	29.1 ±2.9	27.6 ±3.3	27.7 ±3.1	26.1 ±3.2	28.5 ±3.1
Sample 8	28.4 ±3.1	28.6 ±3.0	29.0 ±2.8	27.8 ±3.1	23.9 ±2.7	26.7 ±3.2	28.9 ±3.6	25.6 ±2.9	28.7 ±2.9	28.2 ±3.0
Sample 9	28.2 ±3.0	25.3 ±2.8	30.8 ±3.2	29.9 ±3.1	28.9 ±3.0	27.7 ±3.0	27.6 ±3.3	30.1 ±3.0	30.9 ±3.1	27.9 ±3.5
Sample 10	29.1 ±3.0	29.3 ±3.3	30.7 ±3.3	26.5 ±3.3	26.2 ±2.8	28.3 ±2.9	27.0 ±2.9	28.4 ±3.2	28.2 ±2.9	30.3 ±3.1
Sample 11	28.8 ±3.4	27.6 ±3.5	27.9 ±2.9	28.8 ±3.1	29.7 ±2.9	26.7 ±3.0	25.6 ±2.4	30.1 ±3.1	27.2 ±2.8	27.3 ±3.0
Sample 12	27.0 ±3.4	29.3 ±3.1	30.0 ±3.1	28.5 ±2.9	27.1 ±2.7	30.2 ±3.2	28.6 ±2.9	26.8 ±3.1	30.0 ±3.0	28.1 ±3.1
Sample 13	26.0 ±2.8	28.6 ±2.9	27.8 ±2.6	26.9 ±3.0	28.6 ±2.8	29.5 ±3.1	28.3 ±2.8	28.5 ±3.0	29.8 ±2.9	29.6 ±3.3
Sample 14	29.1 ±3.5	27.0 ±3.1	31.1 ±3.0	27.3 ±2.9	26.0 ±2.8	26.9 ±2.8	28.5 ±2.9	29.1 ±2.9	27.5 ±3.2	28.3 ±3.3
Sample 15	27.9 ±3.1	27.9 ±2.9	27.7 ±2.8	26.8 ±3.1	29.2 ±2.7	28.8 ±3.2	28.4 ±3.3	27.2 ±3.0	28.4 ±2.8	29.3 ±3.5
Sample 16	28.1 ±2.9	29.1 ±3.0	27.6 ±2.8	29.1 ±3.1	29.0 ±3.3	28.9 ±3.0	26.4 ±2.8	26.4 ±3.0	31.3 ±3.0	27.3 ±2.9
Sample 17	30.3 ±3.3	27.7 ±3.2	30.8 ±3.2	28.5 ±3.3	29.7 ±3.2	28.0 ±3.1	29.0 ±3.0	27.3 ±3.1	29.4 ±3.1	29.9 ±3.1
Sample 18	27.6 ±2.8	26.6 ±3.0	29.2 ±2.9	29.9 ±3.0	29.6 ±3.1	26.8 ±2.9	30.6 ±2.9	29.6 ±3.4	30.6 ±3.4	30.1 ±2.9
Sample 19	30.0 ±2.9	26.5 ±3.0	27.6 ±2.9	28.5 ±3.2	28.0 ±2.9	28.0 ±3.5	28.4 ±2.8	27.9 ±3.2	30.6 ±3.3	28.0 ±2.8
Sample 20	27.7 ±2.8	29.7 ±2.9	27.8 ±2.8	27.9 ±2.6	30.6 ±3.4	27.6 ±2.5	26.6 ±3.0	28.2 ±3.2	27.1 ±2.7	26.7 ±3.1
Sample 21	26.3 ±2.8	28.0 ±3.0	28.5 ±3.1	27.4 ±2.9	28.4 ±3.1	27.7 ±3.2	30.2 ±2.6	28.2 ±2.7	27.6 ±2.7	30.2 ±3.1
Sample 22	27.6 ±3.1	30.3 ±3.1	28.4 ±2.7	28.0 ±3.5	31.1 ±3.1	29.2 ±2.9	28.6 ±3.1	27.8 ±3.3	28.8 ±3.0	29.6 ±2.9
Run Mean	28.5	28.0	28.6	28.2	28.4	28.4	28.3	27.9	28.6	28.8
True Missed	1	2	1	1	1	0	1	1	0	0

So for 220 samples, the sample 95% confidence interval did not contain the true population mean 8 times, or roughly 3.6% of the time. Which agrees with the statement above. But I also had one run where the real average was not in the sample 95% confidence interval 2 times. That’s 9% of the time for that sample. Seems quite high. But is it statistically improbable? Unfortunately, I don’t know. But sure doesn’t look like it is. Though that could be a flaw in my sampling process.

Histogram of All the Above Samples

Well, in the end I decided to plot a histogram of the 220 samples. You know, just cuz.

histogram showing means of the 220 samples above

It would appear that taking multiple samples of a given size generates something that looks like a normal distribution. And the mean of those samples approaches the population average. We look a bit more at that in the next or a future post.

My Code for this Experiment

Unfortunately the code in the module, est_avg_age.py is starting to get rather messy. Sorry!

And rather than include it here, I think I will add a separate post showing all the current code in that module. But, in est_avg_age.py I did:

As mentioned previously, add a new function, all_stat_sng_sample(), to produce the stats needed by experiments 6 & 7.

Add a new variable, P_PATH, to hold the path to my play directory (where experiment #7 writes the data for further use).
Modify experiment 7 to write all its sample data to a file that I could later use to generate HTML tables and plots for related blog posts (e.g. this one).
Add new variables and related optional command line arguments to allowing me to modify what experiment does (e7_f_nbr, t7_rpt, t7_size, t7_runs, etc.).
Add variable t7_seed and modify code to use it to control the random seeding for the functions randomly selecting the data for our samples, all_stat_sng_sample() and get_single_sample(). Modify those functions accordingly. I wanted to be able to repeat the experiment starting from the same place each time. You many have noticed that Run 1 above contains the same data as the chart showing the 22 samples of size 33.
Modify the code that prints experiment output to the command line.
Add a new experiment, #8, which I likely will not use; so will be deleted as some point in the future.

I also, for my own personal needs, created a new module, population/play/t7_build_table.py. This is the module that processes the data in the file(s) produced by experiment #7 to generate an html table and/or histogram of the sample data for this blog post. See above. Its code will also be in the separate post/page.

Please see, Module Code: est_avg_age.py for Experiment 6 & 7.

Done for This One

And, with that I think I will call it a day.

Resources

Statements of probability and confidence intervals
Standard deviation versus standard error
How to Find a Confidence Interval
Confidence Intervals
Grouped bar chart with labels
Different background colour areas on matplotlib plot

Too Old To Code

Estimate World Average Age: Part 5