Animated Histogram

As mentioned in the last post, Estimate World Average Age: Part 7, I thought I’d look at animating the plotting of the histogram for one or more of our repeated samplings. I’d seen mention of animating matplotlib charts in an article, and thought it might be fun to do.

New Conda Environment

I was eventually going to try and save the animated chart(s) to an mp4 or a gif so that I could embed/include it/them in this post. That could require adding some packages to my conda environment. In case this experiment didn’t go any further, I decided not to mess with my existing environment, base-3.8. So, I cloned a new one, ani-3.8, from it.

(base-3.8) PS R:\learn\py_play> conda info --envs
# conda environments:
#
base                     E:\appDev\Miniconda3
base-3.8              *  E:\appDev\Miniconda3\envs\base-3.8
py30days                 E:\appDev\Miniconda3\envs\py30days

(base-3.8) PS R:\learn\py_play> conda create --name ani-3.8 --clone base-3.8
Source:      E:\appDev\Miniconda3\envs\base-3.8
Destination: E:\appDev\Miniconda3\envs\ani-3.8
Packages: 64
Files: 2
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate ani-3.8
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base-3.8) PS R:\learn\py_play> conda info --envs
# conda environments:
#
base                     E:\appDev\Miniconda3
ani-3.8                  E:\appDev\Miniconda3\envs\ani-3.8
base-3.8              *  E:\appDev\Miniconda3\envs\base-3.8
py30days                 E:\appDev\Miniconda3\envs\py30days

(base-3.8) PS R:\learn\py_play> conda activate ani-3.8
(ani-3.8) PS R:\learn\py_play>

Back to Animating Charts

Figured animating an histogram might take a bit of work, but the initial bit went rather quickly. Essentially, I plot the histogram for every slice of data in the complete data set. That is, for a sampling with a total of 100 repetitions, I will repeatedly plot the histogram for each of data[:1], data[:2], …, data[:99], data[:100]. Note, that last one could be written as data[:], but given how things work the former is more straighforward.

Too make this work I needed to use matplotlib’s animation module, matplotlib.animation. And, more specifically, I will be using matplotlib.animation.FuncAnimation. Among other things, FuncAnimation() takes a call back function that is executed for each frame of the animation. The callback is passed the current frame number on each iteration. This number appears to start at 0 (haven’t yet found clarification in the documentation).

Initial Attempt

I started fairly simply. I just create the full histogram each time the callback is called. Making sure to clear the figure each time. I may down the road look at finding a way to only add changes — assuming that doesn’t take too much effort and/or time.

I also decided not to hard code all the necessary variables for generating the chart in each frame within the callback. So, I created a number of them in the global namespace and used the global keyword to include them in the callback. Perhaps should have used a closure. But, small steps…

Okay, so in the animation callback function:

  • if at end of data, terminate animation loop event
  • clear the figure/chart
  • get the appropriate set of data
  • generate a new histogram
  • add titles, axis labels, etc.

And, in my prior setup:

  • import required modules/packages
  • add data (list: data)
  • specify sample size (int: s_size)
  • specify maximum number of frames (int: max_lp)
  • use numpy to generate bins
  • specify boundaries for x and y axes

And, here’s my initial attempt for the animation callback and the code to run it.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

# eventually would like to get a lot of the meta-data and actual sample data from a file
data = [30.845716086321577, 27.714993270689398, 26.539019466198756, 28.652412751981608, 25.91251105018476, 28.64765858770585, 28.43607425900043, 29.155768024294613, 28.208172328617486, 26.595183258570575, 28.865488929001568, 30.686593914044185, 28.949639621950695, 25.62845914668899, 29.19201774897019, 28.047594512491685, 28.22093542183883, 28.374921165089173, 30.690075634758678, 27.271132349654113, 29.740443263446412, 27.64449438259917, 28.952287681557547, 29.052409562050418, 29.115468450905333, 27.783257243947148, 27.548704530269532, 30.62276598022423, 28.94742189084511, 29.413534278654236, 29.38974836049136, 29.380930696708813, 28.377958686298296, 25.73136328589792, 26.57861731239424, 29.237794824735797, 26.808873073718367, 28.6745133768866, 29.080921150558893, 28.754975906219766, 28.72415124866063, 26.338287455602394, 30.080391362432472, 25.897604085649508, 28.972778954289655, 29.260116185271198, 28.86340953522474, 27.28060642825256, 30.805984223480607, 27.76411635202121, 31.021686055148223, 25.992921118938632, 30.12321928731231, 27.212553791272896, 30.905215921641773, 28.74129849170663, 27.63761576464364, 29.628767187005725, 28.594416579332297, 28.749408044766366, 27.86536302044773, 26.620086966452188, 27.238471545041484, 28.207836094665545, 28.67693576548127, 29.592031794726243, 28.62629156366716, 25.820627686102654, 29.207039853171004, 27.90139482030897, 29.93671731742414, 26.678888773631222, 28.494604917195666, 28.895453164116464, 28.600138218644503, 30.0692463710327, 28.591871496870056, 28.51242379871638, 28.948606671590202, 27.33776348504623, 29.650754154481263, 28.717365896940095, 27.569791814239803, 28.575197523811678, 27.529728084158055, 27.352196701851557, 31.08869246328819, 27.544809201406295, 28.729690047607775, 30.389801507642776, 27.51183455821836, 28.67564013270134, 30.127738834331428, 28.625162276233024, 30.29438580862724, 28.95852097757391, 26.077290603992097, 28.705586201250167, 32.546639208883455, 26.225844583789687]
s_size = 30
max_lp = 100
bins = np.arange(15, 40)
axis = [15, 40, 0, 40]
ani = None

def animate(cnt):
  global s_size, max_lp, bins, axis, fig, ani

  if cnt == max_lp:
    ani.event_source.stop()

  plt.cla()

  plt.axis(axis)
  plt.hist(data[:cnt], bins=bins)
  title = f"Estimated World Average Age for Repeated Samples of Size {s_size}"
  fig.suptitle(title)
  p_hdg = f"Current count: {cnt} Samples"
  plt.gca().set_title(p_hdg)
  plt.gca().set_ylabel('Frequency')
  plt.gca().set_xlabel('Age')


fig = plt.figure()
ani = animation.FuncAnimation(fig, 
                            animate, 
                            interval=100)
plt.show()

And, “bless my soul”, it pretty much worked.

Iteration #2

In this initial attempt, I hard coded the max_y value in the axis variable. I.E. axis = [15, 40, 0, 40]. But, I realized that if I used different data sets, the hard coded approach was possibly unsustainable. So, I decided to sort the maximum y-axis value in code. np.histogram helped me sort things out. I got the borders for the bins and y-values for each using that numpy method. Then I “roughly” rounded it up to the next multiple of 5. Or at least I think I am doing so.

n, bins = np.histogram(data, bins)
max_y = int(round((max(n) + 3) / 5) * 5)
axis = [15, 40, 0, max_y]

I also decided to rename the max_lp variable and determine its value from the data array. There are also, of course, attendent changes in the code for the callback function — later.

- max_lp = 100
+ max_rpts = len(data)

And, I figured why not add a line for the mean at each of the 4 repetition sizes. The sizes are currently hardcoded. I added a couple more global variables to help with this.

do_mean = [30, 50, 70, 100]
mean_plotted = 0

And modified the callback accordingly.

def animate(cnt):
  global s_size, max_rpts, bins, axis, fig, ani, do_mean, mean_plotted

  if cnt == max_rpts:
    ani.event_source.stop()

  plt.cla()

  p_data = data[:cnt]

  plt.axis(axis)
  plt.hist(p_data, bins=bins)
  title = f"Estimated World Average Age for Repeated Samples of Size {s_size}"
  fig.suptitle(title)
  p_hdg = f"Current count: {cnt} Samples"
  plt.gca().set_title(p_hdg)
  plt.gca().set_ylabel('Frequency')
  plt.gca().set_xlabel('Age')

  # if approp nbr repetitions, generate new mean
  if cnt in do_mean:
    mean = sum(p_data) / len(p_data)
    mean_plotted = mean
  # if there is a mean value, plot it
  if mean_plotted:
    plt.axvline(mean_plotted, 0, 1, color='r', label=f'Est World Avg Age: {mean_plotted:.2f}')
    plt.gca().legend()  

That also worked quite nicely.

Iteration #3

But, I didn’t like the frame/repetition count in the chart title starting at 0 and ending at 99. Also, the final update to show the mean for all 100 repeated samples was not happening. So I made some more changes. And, I figured while I was at it, why not sort the axis and bins completely based on the sample data rather than hard coding things.

For the repetition count, I added a new variable to the call back function and used it in the appropriate spots.

  # I was just using the cnt, but it starts at 0
  # so need to adjust ndx for slicing data, note never want to exceed max_rpts
  rpts = min(cnt + 1, max_rpts)
  p_data = data[:rpts]

For the axis and bins values, since I am passing those items as global variables within the callback, I do these calculations outside the callback. And before it is called. I played around a fair bit, but settled on:

min_x = int(round((min(data) - 5) / 5) * 5)
max_x = int(round((max(data) + 4) / 5) * 5)
bins = np.arange(min_x, max_x)
n, bins = np.histogram(data, bins)
max_y = int(round((max(n) + 3) / 5) * 5)
axis = [min_x, max_x, 0, max_y]

And, the whole module looks like the following:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

data = [30.845716086321577, 27.714993270689398, 26.539019466198756, 28.652412751981608, 25.91251105018476, .... ]
s_size = 30

max_rpts = len(data)
min_x = int(round((min(data) - 5) / 5) * 5)
max_x = int(round((max(data) + 4) / 5) * 5)
bins = np.arange(min_x, max_x)
n, bins = np.histogram(data, bins)
max_y = int(round((max(n) + 3) / 5) * 5)
axis = [min_x, max_x, 0, max_y]
do_mean = [30, 50, 70, 100]
mean_plotted = 0
ani = None


def animate(cnt):
  global s_size, max_rpts, bins, axis, fig, ani, do_mean, mean_plotted

  if cnt == max_rpts:
    ani.event_source.stop()

  plt.cla()

  # I was just using the cnt, but it starts at 0
  # so need to adjust ndx for slicing data, note never want to exceed max_rpts
  rpts = min(cnt + 1, max_rpts)
  p_data = data[:rpts]

  plt.axis(axis)
  plt.hist(p_data, bins=bins)
  title = f"Estimated World Average Age for Repeated Samples of Size {s_size}"
  fig.suptitle(title)
  p_hdg = f"Sample Repetitions: {rpts}"
  plt.gca().set_title(p_hdg)
  plt.gca().set_ylabel('Frequency')
  plt.gca().set_xlabel('Age')

  # if approp nbr repetitions, generate new mean
  if cnt in do_mean:
    mean = sum(p_data) / len(p_data)
    mean_plotted = mean
  # if there is a mean value, plot it
  if mean_plotted:
    plt.axvline(mean_plotted, 0, 1, color='r', label=f'Est World Avg Age: {mean_plotted:.2f}')
    plt.gca().legend()  


fig = plt.figure()
ani = animation.FuncAnimation(fig, 
                            animate, 
                            interval=100)
plt.show()

And here’s an mp4 of the current animation.

Saving the Animated Chart to a File

Thought I should cover this before calling it a day.

Turns out to be pretty easy. Didn’t even have to load any additional modules/packages. So, doesn’t look like I needed to create a new conda environment. But, I have haven’t yet looked at saving the animations as animated gifs. So, that may change. But for now I am happy with a plain old mp4.

I also didn’t look into any of the options (framerate, dpi, bitrate, blitting, etc.). Just went as simple as possible and that works for now. First thing was to initialize and setup the Writer for the animation .save() function. I did this to remind me there were other options I could investigate should I so choose. (I could have skipped this step and just specified the writer I wanted in the call to .save().)

Then just call the save function. I did this before calling the .show() function. And, I could have skipped the .show() as I had saved the animation to an mp4 and could just have watched that. I had also added a variable to control the save process. I did this because I was also investigating using .to_html5_video(). In the end I settled on only saving as an mp4.

# in the upper portion of the module

# "" = do not save, "html" = save as video tag with embeded src, "mp4" = save as mp4
save_ani = "mp4"
save_fl = "hist_1"

...

# and just before the end of the module

if save_ani == 'html':
  with open(f"{save_fl}.html", "w") as f:
    print(ani.to_html5_video(), file=f)

if save_ani == 'mp4':
  # Set up formatting for the movie files
  Writer = animation.writers['ffmpeg']
  writer = Writer(metadata=dict(artist='koach'))
  ani.save(f"{save_fl}.mp4", writer=writer)

Done, m’thinks

Well that’s it for this one. In the next one, I think I will look a bit at playing with the bin size. Not sure what else. Maybe look at the animations for a few of the other samples we created previously. Will probably look at some kind of data files specified via the command line. JSON perhaps. Until then…

Did you notice that the estimated world average was more-or-less in the middle of the bin represeting the distribution’s mode? And was there pretty much from start to finish.

Resources