Carrying on from where we left off last week, I am going to look at generating runs of a given sample size and compare their means and 95% confidence intervals.
Comparing Repeated Samples of the Same Size
I have seen the following stated in a variety of ways. But the following was certainly the most straight forward.
I started by building an experiment to get N samples of size X. And to plot that on a chart. Something like the following for 10 samples of size 33.
The line at the top of each bar shows the 95% confidence interval for that sample. The red line is the actual world average age for the year in question. The green coloured area covers the 95% confidence interval for the the first sample (see next paragraph).
That coloured area is rather speculative. It was just something I thought might give me some idea of how confidence intervals work. But, I have since decided it adds no real value to the experiment or discussion. But, I kinda like the look of it being there, so have left it in.
Here’s another chart for 22 samples of size 33.
Then I decided to run this process multiple times and tabulate the results. So, I did 10 runs of 22 samples of size 33. See table below. My code saved the data to a file. Then I wrote another module to read the data and generate the HTML table for inclusion in the post. I did not plot each run. So, sorry, no pictures.
Sample means +- std err for 22 samples of size 33
Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | |
---|---|---|---|---|---|---|---|---|---|---|
Sample 1 | 29.9 ±3.3 | 25.3 ±2.5 | 29.2 ±3.1 | 28.2 ±3.1 | 28.7 ±3.0 | 27.4 ±3.0 | 27.7 ±3.2 | 26.6 ±3.5 | 26.9 ±2.9 | 27.6 ±2.7 |
Sample 2 | 32.3 ±3.2 | 27.7 ±3.1 | 27.9 ±2.8 | 28.0 ±3.1 | 28.4 ±3.2 | 29.6 ±3.1 | 30.5 ±3.2 | 26.6 ±2.9 | 27.9 ±3.2 | 29.7 ±3.1 |
Sample 3 | 26.9 ±3.0 | 30.1 ±2.7 | 29.3 ±3.2 | 27.5 ±2.9 | 28.6 ±3.2 | 28.9 ±3.0 | 28.2 ±3.1 | 28.4 ±3.1 | 26.7 ±3.2 | 29.7 ±3.1 |
Sample 4 | 28.2 ±3.3 | 28.3 ±3.2 | 26.6 ±3.2 | 28.3 ±3.1 | 30.6 ±3.1 | 28.9 ±3.2 | 28.0 ±3.1 | 29.3 ±3.0 | 27.5 ±2.8 | 30.0 ±2.6 |
Sample 5 | 28.2 ±2.9 | 28.1 ±3.0 | 27.7 ±3.0 | 31.5 ±2.8 | 26.7 ±2.8 | 29.3 ±3.1 | 28.7 ±3.1 | 27.8 ±3.2 | 27.9 ±3.2 | 29.4 ±2.9 |
Sample 6 | 31.3 ±3.1 | 25.9 ±3.0 | 25.9 ±2.5 | 27.5 ±2.9 | 28.2 ±3.0 | 30.3 ±2.8 | 29.3 ±2.7 | 26.4 ±3.1 | 28.9 ±3.1 | 27.8 ±3.2 |
Sample 7 | 27.5 ±3.1 | 28.2 ±2.7 | 28.0 ±2.4 | 27.6 ±2.9 | 26.9 ±2.7 | 29.1 ±2.9 | 27.6 ±3.3 | 27.7 ±3.1 | 26.1 ±3.2 | 28.5 ±3.1 |
Sample 8 | 28.4 ±3.1 | 28.6 ±3.0 | 29.0 ±2.8 | 27.8 ±3.1 | 23.9 ±2.7 | 26.7 ±3.2 | 28.9 ±3.6 | 25.6 ±2.9 | 28.7 ±2.9 | 28.2 ±3.0 |
Sample 9 | 28.2 ±3.0 | 25.3 ±2.8 | 30.8 ±3.2 | 29.9 ±3.1 | 28.9 ±3.0 | 27.7 ±3.0 | 27.6 ±3.3 | 30.1 ±3.0 | 30.9 ±3.1 | 27.9 ±3.5 |
Sample 10 | 29.1 ±3.0 | 29.3 ±3.3 | 30.7 ±3.3 | 26.5 ±3.3 | 26.2 ±2.8 | 28.3 ±2.9 | 27.0 ±2.9 | 28.4 ±3.2 | 28.2 ±2.9 | 30.3 ±3.1 |
Sample 11 | 28.8 ±3.4 | 27.6 ±3.5 | 27.9 ±2.9 | 28.8 ±3.1 | 29.7 ±2.9 | 26.7 ±3.0 | 25.6 ±2.4 | 30.1 ±3.1 | 27.2 ±2.8 | 27.3 ±3.0 |
Sample 12 | 27.0 ±3.4 | 29.3 ±3.1 | 30.0 ±3.1 | 28.5 ±2.9 | 27.1 ±2.7 | 30.2 ±3.2 | 28.6 ±2.9 | 26.8 ±3.1 | 30.0 ±3.0 | 28.1 ±3.1 |
Sample 13 | 26.0 ±2.8 | 28.6 ±2.9 | 27.8 ±2.6 | 26.9 ±3.0 | 28.6 ±2.8 | 29.5 ±3.1 | 28.3 ±2.8 | 28.5 ±3.0 | 29.8 ±2.9 | 29.6 ±3.3 |
Sample 14 | 29.1 ±3.5 | 27.0 ±3.1 | 31.1 ±3.0 | 27.3 ±2.9 | 26.0 ±2.8 | 26.9 ±2.8 | 28.5 ±2.9 | 29.1 ±2.9 | 27.5 ±3.2 | 28.3 ±3.3 |
Sample 15 | 27.9 ±3.1 | 27.9 ±2.9 | 27.7 ±2.8 | 26.8 ±3.1 | 29.2 ±2.7 | 28.8 ±3.2 | 28.4 ±3.3 | 27.2 ±3.0 | 28.4 ±2.8 | 29.3 ±3.5 |
Sample 16 | 28.1 ±2.9 | 29.1 ±3.0 | 27.6 ±2.8 | 29.1 ±3.1 | 29.0 ±3.3 | 28.9 ±3.0 | 26.4 ±2.8 | 26.4 ±3.0 | 31.3 ±3.0 | 27.3 ±2.9 |
Sample 17 | 30.3 ±3.3 | 27.7 ±3.2 | 30.8 ±3.2 | 28.5 ±3.3 | 29.7 ±3.2 | 28.0 ±3.1 | 29.0 ±3.0 | 27.3 ±3.1 | 29.4 ±3.1 | 29.9 ±3.1 |
Sample 18 | 27.6 ±2.8 | 26.6 ±3.0 | 29.2 ±2.9 | 29.9 ±3.0 | 29.6 ±3.1 | 26.8 ±2.9 | 30.6 ±2.9 | 29.6 ±3.4 | 30.6 ±3.4 | 30.1 ±2.9 |
Sample 19 | 30.0 ±2.9 | 26.5 ±3.0 | 27.6 ±2.9 | 28.5 ±3.2 | 28.0 ±2.9 | 28.0 ±3.5 | 28.4 ±2.8 | 27.9 ±3.2 | 30.6 ±3.3 | 28.0 ±2.8 |
Sample 20 | 27.7 ±2.8 | 29.7 ±2.9 | 27.8 ±2.8 | 27.9 ±2.6 | 30.6 ±3.4 | 27.6 ±2.5 | 26.6 ±3.0 | 28.2 ±3.2 | 27.1 ±2.7 | 26.7 ±3.1 |
Sample 21 | 26.3 ±2.8 | 28.0 ±3.0 | 28.5 ±3.1 | 27.4 ±2.9 | 28.4 ±3.1 | 27.7 ±3.2 | 30.2 ±2.6 | 28.2 ±2.7 | 27.6 ±2.7 | 30.2 ±3.1 |
Sample 22 | 27.6 ±3.1 | 30.3 ±3.1 | 28.4 ±2.7 | 28.0 ±3.5 | 31.1 ±3.1 | 29.2 ±2.9 | 28.6 ±3.1 | 27.8 ±3.3 | 28.8 ±3.0 | 29.6 ±2.9 |
Run Mean | 28.5 | 28.0 | 28.6 | 28.2 | 28.4 | 28.4 | 28.3 | 27.9 | 28.6 | 28.8 |
True Missed | 1 | 2 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 |
So for 220 samples, the sample 95% confidence interval did not contain the true population mean 8 times, or roughly 3.6% of the time. Which agrees with the statement above. But I also had one run where the real average was not in the sample 95% confidence interval 2 times. That’s 9% of the time for that sample. Seems quite high. But is it statistically improbable? Unfortunately, I don’t know. But sure doesn’t look like it is. Though that could be a flaw in my sampling process.
Histogram of All the Above Samples
Well, in the end I decided to plot a histogram of the 220 samples. You know, just cuz.
It would appear that taking multiple samples of a given size generates something that looks like a normal distribution. And the mean of those samples approaches the population average. We look a bit more at that in the next or a future post.
My Code for this Experiment
Unfortunately the code in the module, est_avg_age.py is starting to get rather messy. Sorry!
And rather than include it here, I think I will add a separate post showing all the current code in that module. But, in est_avg_age.py I did:
- As mentioned previously, add a new function, all_stat_sng_sample(), to produce the stats needed by experiments 6 & 7.
- Add a new variable, P_PATH, to hold the path to my play directory (where experiment #7 writes the data for further use).
- Modify experiment 7 to write all its sample data to a file that I could later use to generate HTML tables and plots for related blog posts (e.g. this one).
- Add new variables and related optional command line arguments to allowing me to modify what experiment does (e7_f_nbr, t7_rpt, t7_size, t7_runs, etc.).
- Add variable t7_seed and modify code to use it to control the random seeding for the functions randomly selecting the data for our samples, all_stat_sng_sample() and get_single_sample(). Modify those functions accordingly. I wanted to be able to repeat the experiment starting from the same place each time. You many have noticed that Run 1 above contains the same data as the chart showing the 22 samples of size 33.
- Modify the code that prints experiment output to the command line.
- Add a new experiment, #8, which I likely will not use; so will be deleted as some point in the future.
I also, for my own personal needs, created a new module, population/play/t7_build_table.py. This is the module that processes the data in the file(s) produced by experiment #7 to generate an html table and/or histogram of the sample data for this blog post. See above. Its code will also be in the separate post/page.
Please see, Module Code: est_avg_age.py for Experiment 6 & 7.
Done for This One
And, with that I think I will call it a day.
Resources
- Statements of probability and confidence intervals
- Standard deviation versus standard error
- How to Find a Confidence Interval
- Confidence Intervals
- Grouped bar chart with labels
- Different background colour areas on matplotlib plot