I am still a touch afraid of tackling the second style of PHP data files. They are considerably less strictly formatted than the first set. So figuring out how to parse those will wait a little longer.
As mentioned in closing last post, I was thinking of adding barcharts to the dashboard to provide the detail the gauge does not. But, it has crossed my mind that I could also use a variation of a heatmap to do something similar. So I think for this post I am going to compare the two approaches and see which I like better. One may work better for the monthly to-date dashboard scenario and one for the yearly to-date dashboard scenario.
But until we play with the two of them, we shall never know. (My apologies for the first-person plural.)
Yearly Chart
I am going to start with the year to-date dashboard scenario. Less data per column, so either option, stacked bar chart or heatmap, may work equally well. Though I expect one may be easier to generate in a meaningful way than the other. By meaningful I am thinking that in a stacked bar chart a value of zero will not show a bar element. Whereas in a heatmap it will show an element of some specific colour. So you know that month had little or no rainfall. Expect that will not be so clear in a stacked bar chart without some extra work.
But let’s find out. I am going to start with the heatmap chart approach. Of course I only have a few years to work with, so I will use the last five years in the rainfall table for the experimentation.
Generate Data for Heatmap/Stacked Barchart
We need an array of data for the Plotly heatmap chart method. And, in our case, that will need to have 12 rows with a suitable number of columns. One of the edge cases we will need to consider is a year without any data for a specific month. At this point, I don’t know what SQLite will output for a query in that situation.
And, I also don’t know if we are missing data for any months in those last five years. But I do know we are for the first year, 2014. So let’s include that year in our query to see what happens. Well, with a query not yet sorted out. But we know what we want to get.
Base Query
Let’s get to work on the query to get the monthly totals for all the months in the database.
I have added boolean variables to control which type of chart will be generated—I know they will eventually be needed.
Once again a nested query. The nested portion selects the maximum value of the datetime
column and monthly to-date rainfall for all rows grouped on the “yyyy.mm” portion of the datetime
column. This will return a temporary table with the maximum date for each month and the month to-date total for that date. Which should be the total monthly rainfall for that month. I am excluding the current month from the query if there is data for it in the table. The current month will likely not have the total rainfall for the month in many cases.
The outer query, just selects the year, month and monthly rainfall for each row in the temporary table.
... ...
if do_g_func:
do_mn_2dt = False
do_yr_2dt = False
do_yr_bar = True
# get current year and month as yyyy.mm
c_ym = time.strftime("%Y.%m", time.localtime())
... ...
if do_yr_bar:
# let's look at plotting a stacked bar chart and/or heatmap to display
# yearly rainfall for some number of years. Each bar will have a section
# for each momth, though that may be 0 for some years and momths
do_hmap = False
do_sbar = False
# get data from rainfall table, excluding current year and month
q_yrly = f"""SELECT SUBSTR(dttm, 1, 4) as yr, SUBSTR(dttm, 6, 2) as mon, ROUND(m_tot, 2)
FROM
(SELECT MAX(datetime) as dttm, monthly as m_tot
FROM {rfall.tnms['rf_tnm']}
WHERE SUBSTR(datetime, 1, 7) < '{c_ym}'
GROUP BY SUBSTR(datetime, 1, 7));"""
rslt = rfall.qry_exec(q_yrly)
for rf in rf_yr:
print(rf)
And, the output in the terminal was as follows (severly truncated):
(dbd-3.13) PS R:\learn\dashboard\utils> python db_charts.py
('2014', '03', 174.25)
('2014', '04', 116.25)
('2014', '05', 143.0)
('2014', '06', 41.5)
('2014', '07', 26.0)
... ...
('2021', '08', 50.2)
('2021', '09', 195.2)
('2021', '10', 225.18)
('2021', '11', 541.59)
('2021', '12', 174.25)
If I manually specified the current year and month as “2021.12”, the last row above was not returned.
Generate 2D Array
But for our purposes, heatmap, we need a 2D array of data. For my current view, rows are the months and columns are the years.
I was planning to use a dataframe to provide the data for the charts. As that generally works with matplotlib. Though that may not have been the best idea with Plotly.
I will start by generating a nested dictionary: outer dictionary will be keyed on the year and the inner keyed on month with the value being the monthly rainfall. I won’t bother printing the dictionary. Pandas will happily accept that dictionary and create the 2D dataframe. Because of how Pandas handled the missing values, I needed to sort the dataframe to get things in the order I wanted. (If you don’t know what Pandas did, print the dataframe without sorting it.)
# need 2D array, so a bit of work to do
# going to create a dictionary, then a dataframe
rf_yr_mn = {}
for rw in rslt:
if rw[0] not in rf_yr_mn:
rf_yr_mn[rw[0]] = {}
rf_yr_mn[rw[0]][rw[1]] = rw[2]
rf_yr = pd.DataFrame(rf_yr_mn)
rf_yr.sort_index(inplace=True)
print("\n", rf_yr)
That output the following to the terminal.
(dbd-3.13) PS R:\learn\dashboard\utils> python db_charts.py
2014 2015 2016 2017 2018 2019 2020 2021
01 NaN 253.00 215.25 99.9 319.1 202.5 460.45 308.90
02 NaN 129.50 202.50 111.0 162.4 110.1 144.80 144.90
03 174.25 225.00 203.25 387.5 145.4 52.5 103.30 107.90
04 116.25 59.75 35.50 210.0 199.5 151.6 38.20 38.40
05 143.00 21.50 57.33 142.3 11.2 53.2 117.30 69.30
06 41.50 5.50 73.50 54.5 81.9 24.5 73.80 59.60
07 26.00 21.50 35.50 2.8 15.6 58.9 37.20 0.00
08 15.00 78.50 22.50 7.1 11.7 34.6 45.30 50.20
09 115.50 54.00 84.00 58.0 189.6 191.2 115.30 195.20
10 201.75 142.00 260.50 198.7 121.1 155.0 145.30 225.18
11 277.25 302.50 350.00 273.6 329.1 121.1 261.60 541.59
12 187.50 278.00 170.90 215.5 339.0 302.6 408.10 174.25
That looks like what I am after.
Generate Heatmap
Okay, let’s generate a heatmap of the above data.
As the month labels in the above data are strings of two digits, I decided to create a list of the full month names. This is outside the if
blocks for the two charts as they will both need that information.
Generating the heatmap was pretty straightforward. Except for getting the title to display. I messed around, searched, etc. for quite some time before I realized what the problem was. I had set the margins around the plot area to \(0\). So there was nowhere for Plotly to put the title. Adding a top margin resolved my problem. Very slow witted me, a lot of wasted time and effort.
I also saved the HTML for the heatmap to a file so that I could display it below (embed in an iframe
).
Also, note the hoverongaps
parameter in the call to Heatmap
. Without it, the missing months would be shown in the plot background colour and when you hovered over them, Plotly would advise that they were not a number. Not horribly useful. I think that not showing them at all is more meaningful.
And, I set the plot background color to white to match the paper colour. That way I felt the missing data would be more obvious.
# create list of month names using calendar module
l_mons = list(calendar.month_name)[1:]
if do_hmap:
fig = go.Figure(data=go.Heatmap(z=rf_yr, x=rf_yr.columns, y=l_mons, colorbar=dict(title='Rainfall (mm)'), hoverongaps=False))
# want some padding betwen tick labels and plot, so also need to adjust plot bg colour
fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",
margin=dict(l=0, r=0, b=0, t=40, pad=5),
title_text="Historical Annual Rainfall", title_font_size=25, title_x=0.5, title_y=0.98,
xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Mont")))
# fig.layout.plot_bgcolor = "#ffffff"
fig.write_html(d_fig/"yr_hmap.html")
fig.show()
And, here’s the result. If you hover over one of the blocks in the heatmap you will get the details for that month. Most Plotly charts provide this kind of interactivity by default. The gauge did not seem to do so, so I only included pngs in the previous post.
My apologies, but the HTML file for the plot is rather large. Don’t know if there is any way to reduce the size. Will eventually need to do some research.
An interesting look. And the dark purple colours make it clear which months had small amounts of rainfall. And the oranges and yellows which had larger amounts of rainfall.
Generate Stacked Barchart
On to the other format, a stacked barchart. In this case I couldn’t just pass the dataframe to the bar chart method. So, I added each row as an individual trace
.
New traces can be added to a graph object figure using the add_trace() method. This method accepts a graph object trace (an instance of go.Scatter, go.Bar, etc.) and adds it to the figure. This allows you to start with an empty figure, and add traces to it sequentially.
Creating and Updating Figures in Python
And because this creates a clustered barchart, I needed to set the barmode
to get what I wanted. It’s in a separate call because I didn’t modify the rest of the layout until after I got the basics working. Once again saving the HTML to include below.
if do_sbar:
fig = go.Figure()
# can't just pass dataframe, so going one row at a tiem
for ndx, cmn in enumerate(l_mons):
# convert enumerates index to a suitable index for the dataframe
s_ndx = f"0{ndx + 1}"[-2:]
fig.add_trace(go.Bar(
x=rf_yr.columns,
y=list(rf_yr.loc[s_ndx, :]),
name=cmn
))
# the above generates a group bar chart, so convert to stacked
fig.update_layout(barmode="stack")
fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",
margin=dict(l=0, r=0, b=0, t=40, pad=5),
title_text="Historical Annual Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Mont")),
)
fig.write_html(d_fig/"yr_sbar.html")
fig.show()
And, the resulting chart is provided below.
And as much as the heatmap is an interesting concept, I do believe the bar chart is more understandable for most people. But I must admit I rather like the idea that the colour of each block in the heatmap gives a decent idea of how much rain fell that month without having to do any arithmetic or hover over the appropriate block.
Do you see that there is no July block in the the stacked bar for 2021.
Refactor Classes and Modules
Okay, let’s look at getting the developmental code above into the proper classes and/or modules. Then refactor the developmental/test code to use the new methods/functions.
New Method in Weather_db Class
First, a new method, get_rf_monthly
, in the Weather_db
class to get the monthly rainfall data from the database. I am going to add a parameter to determine whether the method returns the array of raw data tuples or a dataframe. Thought I would always return the former, but…
... ...
import time
... ...
def get_rf_monthly(self, do_df=True):
"""Get total monthly rainfall for all months in rainfall table
excluding the current month, as it may not be complete.
params:
do_df: if true convert query output to dataframe and retun df
else return raw query output (list of tuples)
returns: see params
"""
# get current year and month as yyyy.mm
c_ym = time.strftime("%Y.%m", time.localtime())
# current table name
c_tnm = self.tnms['rf_tnm']
# get data from rainfall table
q_yrly = f"""SELECT SUBSTR(dttm, 1, 4) as yr, SUBSTR(dttm, 6, 2) as mon, ROUND(m_tot, 2)
FROM
(SELECT MAX(datetime) as dttm, monthly as m_tot
FROM {c_tnm}
WHERE SUBSTR(datetime, 1, 7) < '{c_ym}'
GROUP BY SUBSTR(datetime, 1, 7));"""
rslt = self.qry_exec(q_yrly)
if do_df:
rf_yr_mn = {}
for rw in rslt:
if rw[0] not in rf_yr_mn:
rf_yr_mn[rw[0]] = {}
rf_yr_mn[rw[0]][rw[1]] = rw[2]
rf_yr = pd.DataFrame(rf_yr_mn)
rf_yr.sort_index(inplace=True)
return rf_yr
else:
return rslt
Two New Charting Functions db_charts Module
Let’s add a couple new functions to db_charts.py
. One for each of the two charts. For now they will pretty much just duplicate the developmental code. I am not going to make any attempt, at this time, to make them more generally applicable.
... ...
import calendar
import pandas as pd
import plotly.graph_objects as go
if __name__ == "__main__":
import time
from pathlib import Path
import numpy as np
from weather_db import Weather_db
... ...
# module variables
# create list of month names using calendar module, used in multiple charting functions
l_mons = list(calendar.month_name)[1:]
... ...
def rf_mon_hmap(rf_yr, sv_pth=""):
fig = go.Figure(data=go.Heatmap(z=rf_yr, x=rf_yr.columns, y=l_mons, colorbar=dict(title='Rainfall (mm)'), hoverongaps=False))
# want some padding betwen tick labels and plot, so also need to adjust plot bg colour
fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",
margin=dict(l=0, r=0, b=0, t=40, pad=5),
title_text="Historical Annual Rainfall", title_font_size=25, title_x=0.5, title_y=0.98,
xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Month")))
if sv_pth:
fig.write_html(sv_pth)
return fig
def rf_mon_sbar(rf_yr, sv_pth=""):
fig = go.Figure()
# can't just pass dataframe, so going one row at a tiem
for ndx, c_mn in enumerate(l_mons):
# convert enumerates index to a suitable index for the dataframe
s_ndx = f"0{ndx + 1}"[-2:]
# print(rf_yr.loc[s_ndx, :])
# y_vl = list(rf_yr.loc[s_ndx, :])
fig.add_trace(go.Bar(
x=rf_yr.columns,
y=list(rf_yr.loc[s_ndx, :]),
name=c_mn
))
# the above generates a group bar chart, so convert to stacked
fig.update_layout(barmode="stack")
fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",
margin=dict(l=0, r=0, b=0, t=40, pad=5),
title_text="Historical Annual Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Month")),
)
if sv_pth:
fig.write_html(sv_pth)
return fig
Refactored Test Code
... ...
if do_yr_bar:
# let's look at plotting a stacked bar chart and/or heatmap to display
# yearly rainfall for some number of years. Each bar will have a section
# for each momth, though that may be 0 for some years and momths
do_hmap = False
do_sbar = True
# get data from rainfall table, excluding current year and month
rf_yr = rfall.get_rf_monthly()
if do_hmap:
fl_pth = d_fig/"yr_hmap.html"
fig = rf_mon_hmap(rf_yr)
fig.show()
if do_sbar:
fl_pth = d_fig/"yr_sbar.html"
fig = rf_mon_sbar(rf_yr, sv_pth=fl_pth)
fig.show()
Well, that is certainly short and sweet.
And, no sense showing you the charts again. But, I can assure you, the above code replicates the output of the prior developmental code. And if sv_pth
is provided, it does indeed save the chart’s HTML to the specified file path. Though I don’t check that the path is valid (my bad?).
Done
I am calling this post fini. I believe it has covered the intended subject reasonably well. And, I am pleased to have generated the charts. Even if it took me much too long to get a title displayed.
Until next time, may your learning experiences take you less time than mine seem to need.
Resources
- plotly.graph_objects.Figure.add_traces
- plotly.graph_objects.Heatmap
- plotly.io.write_image
- Creating and Updating Figures in Python
- Heatmaps in Python
- Python Figure Reference: Single-Page
- Plotly Update Layout: Customize Figure Appearance