I am not done with Monte Carlo Simulations. And, perhaps for a post or two more. It is a powerful tool when used properly.

Apparently one of the most common examples of a Monte Carlo simulation in courses and lectures is determining the value of π. Though an irrational and transcendental number, it is relied upon by much of mathematics and science at large. So I am thinking I’ll have a look at that next.

We are going to use an adaptation of what’s called Buffon’s needle problem.

Concept

The idea is we draw a square. Then draw the largest circle that will fit in the square. So, the circle has a radius of r and the square’s sides are 2r in length. Now let’s look at the ratio of the circle’s area to that of the square.

Area of the circle will be $\pi * r^2$. And for the square $2r * 2r$, which is equal to $4 * r^2$. So the ratio will be:

$$\frac{\pi * r^2}{4 * r^2} = \frac{\pi}{4}$$

And,

$$\pi = {ratio\ areas} * 4$$

Now how do we go about getting the ratio of the areas? Well, to get an approximation, we could drop grains of rice, sand or such on our drawing. Then count the number of grains within the circle (including those on the line drawing it) and divide that by the total number grains we dropped within the square (including those on the line drawing the square).

That said, we are going to use a computer and Monte Carlo simulation to do that for us. Probably a lot less work.

Coding the Basic Simulation

We will need to determine whether each random point we generate is within the circle or not. We can use the distance the point is from the center of the square/circle to make this determination. To make that calculation easier, we will center our square on the point $(0, 0)$. Which means the corners of the square will be at -1 or 1 on each axis. Then for the point $(x, y)$ the distance from the center will be $x^2 + y^2 $. And if that distance is $<= 1$ the point is within the circle. So we will randomly generate x and y values between -1 and 1 inclusive, determine whether in the circle or not and increment a counter accordingly. Then use the equation derived above to estimate π.

Now in the real world the type of probability distribution(s) needed will depend on the situation being simulated. In our case, that would be a uniform continuous distribution from -1 to 1. So we will be using Python’s random.uniform(a, b) to simulate our dropping grains of sand.

Code should be pretty straightforward. I am importing matplotlib as I plan on running the simulation multiple times and plotting the results in a histogram. Just for fun.

import random as rand
# the following will be used later in the post
import time
import statistics as stats
import matplotlib.pyplot as plt

N_SIM = 100     # for later, see above re: matplotlib
N_BIN = 20      # also for later
N_SMP = 10000
P_LOW = -1
P_HGH = 1
RADIUS = 1
in_circle = 0

for _ in N_SMP:
  x = rand.uniform(P_LOW, P_HGH)
  y = rand.uniform(P_LOW, P_HGH)
  p_len = x**2 + y**2
  if p_len <= RADIUS:
    in_circle += 1

e_pi = (in_circle / N_RPT) * 4
print(f"estimated value of pi is {e_pi}")

You may have noticed I am dividing the number of points in the circle by the total number of generated points. That’s because, of course, every randomly generated point will be in the square.

And, running the above.

(fin-3.11) PS R:\learn\py_play\misc> python pi_monte_carlo.py
estimated value of pi is 3.1428

Not too shabby.

Histogram of 100 Simulations

Want to run the simulatin 100 times and plot the results in a histogram. Will also sort and plot the mean and variance.

Okay, so I took the basic simulation code, including the in_circle variable and put that in a function.

def mcs_pi(n_smp, p_lw, p_hgh, mx_rd):
  in_circle = 0
  for _ in range(n_smp):
    x = rand.uniform(p_lw, p_hgh)
    y = rand.uniform(p_lw, p_hgh)
    p_len = x**2 + y**2
    if p_len <= mx_rd:
      in_circle += 1
  return (in_circle / n_smp) * 4

Then, I ran the simulation (function) N_SIM times, saving the result in an array. Then I plotted a histogram of the results. Adding text giving the mean and variance for the generated dataset (sim_rslt).

sim_rslt = []

for i in range(N_SIM):
  e_pi = mcs_pi(N_SMP, P_LOW, P_HGH, RADIUS)
  sim_rslt.append(e_pi)

pi_mean = sum(sim_rslt) / len(sim_rslt)
pi_var = stats.variance(sim_rslt, pi_mean)

n, bins, patches = plt.hist(sim_rslt, N_BIN)

plt.xlabel('Estimated Pi')
plt.ylabel('Bin Count')
plt.title(f'Monte Carlo Estimation of Pi\n({N_SIM} simulations of {N_SMP} samples each)')
mu_var = f"mean = {pi_mean:0.4f}\n\nvar = {pi_var:0.4f}"
# guessing at locationn for text
plt.text(3.10, 12, mu_var, c='k')
plt.grid(True)
plt.show()

Which for one execution produced the following.

Histogram produced from running 10000 sample monte carlo simulation 100 times

As you can see, my guess for a suitable location for the added text was not so good. Let’s see if we can fix that.

Locating Text

I thought about things a bit, then remembered that we are getting a fair bit of info back from matplotlib when executing the histogram function.

n is an array of the values of the histogram bins. In the above case, the count of elements within that bin. bins is an array of the edges of the bins. Length: number of bins + 1 (number of bins left edges and right edge of last bin).

I decided I would use the middle of the first bin for the x value. And the value of the second largest bin for the y value. I figured that bin would likely be in the middle of the histogram’s x axis. So, I take the middle 10 or so bins, sort them and select the 2nd or 3rd last bin. The 3rd last because I ran into the situation where the largest two bins had the same sample count. So, if the largest to bins are the same size I select the 3rd last. Figure the odds of 3 bins being the same size is pretty slim. But…

# sort 10 or so middle bin values, will use to get value of y for text
b_m_n = N_BIN // 2
mid_bins = n[b_m_n-5:b_m_n+5]
mid_bins.sort()

txt_x = bins[0] + ((bins[1]-bins[0])/2)
if mid_bins[-1] != mid_bins[-2]:
  txt_y = mid_bins[-2]
else:
  txt_y = mid_bins[-3]

... unchanged code ...

plt.text(txt_x, txt_y, mu_var, c='k')

Histogram produced from running 10000 sample monte carlo simulation 100 times, with text in the wrong place

And, just like that I am proven wrong. Three bins of the same size. So a wee refactor. Including accounting for another infrequent issue.

c_b = -2
while mid_bins[c_b] == mid_bins[c_b+1]:
  c_b -= 1
# sometimes end up with a y value that is too high, so reduce slightly
txt_y = mid_bins[c_b] - 1

And that seems to work reasonably well. At least for the few runs I have made.

Histogram produced from running 10000 sample monte carlo simulation 100 times, with text a decent location

Timing Execution

The module runs fairly quickly, but I decided to see how long the repeated simulations took to execute.

ts = time.perf_counter()
for i in range(N_SIM):
  e_pi = mcs_pi(N_SMP, P_LOW, P_HGH, RADIUS)
  sim_rslt.append(e_pi)
te = time.perf_counter()
print(f"{N_SIM} simulations of {N_SMP} samples took {te - ts} seconds")

And:

(fin-3.11) PS R:\learn\py_play\misc> python pi_monte_carlo.py
100 simulations of 10000 samples took 2.85143949999474 seconds

So I decided to run 1000 simulations, I can live with waiting 30 seconds each time. But I figured I best increase number of bins, I settled on 50.

N_SIM = 1000
N_BIN = 50

And, here’s one of the histograms and timing output.

Histogram produced from running 10000 sample monte carlo simulation 1000 times

(fin-3.11) PS R:\learn\py_play\misc> python pi_monte_carlo.py
1000 simulations of 10000 samples took 27.55749069992453 seconds

Done

I think that’s it for this one. A little Monte Carlo simulation, a little geometry, a little Python along with a little matplotlib learning. Good all round fun.

Until next time, may your coding prove as much fun.

Resources

random.uniform()
statistics.variance()
time.perf_counter()
matplotlib.pyplot.hist()
matplotlib.pyplot.text()
Pyplot tutorial

Too Old To Code

Data Science: Monte Carlo Simulation - Estimate Pi