As mentioned, I expect I will also want to be able to display some historical data for a given month. Something like the month’s total rainfall along with the minimum and maximum daily rainfall for some number of years. For the historical yearly data I was pretty sure I’d be using either a heatmap or stacked barchart. For this case I am not too sure what the best choice might be.
Also, if I were expecting to be running the dashboard on some server on a continuous basis, I would likely generate the data for the charts as needed. However, I expect I will be shutting the dashboard down each night and starting it up each morning. So, instead of calculating the data and generating the chart once a month, I will be doing it more or less daily. (Unless, though unlikely, I get it up and running in the cloud.)
So I am thinking another database table might be in order. Time will tell.
Let’s get to some developmental code. Once again in the db_charts.py
module. There will, I am sure, eventually be a new method in the Weather_db class and a couple new functions in the charts module.
Historical Monthly Data for a Specific Month
Pretty clearly we need the data we want before we can plot anything. For now, I want the monthly total, daily minimum and daily maximum for a specific month for 10 years. At least at the moment I am thinking 10 years. Of course, currently don’t have that many.
Query
Let’s see if we can sort out the query to get the data we are after. For now, will look at getting data for March 2014-2021.
... ...
if do_g_func:
do_mn_2dt = False
do_yr_2dt = False
do_yr_bar = False
do_mn_bar = True
... ...
if do_mn_bar:
do_gbar = True
do_line = False
do_sctr = False
# Get the necessary data for some specific month
c_mon = '03'
# do not get any data for current month
nx_q = f"""SELECT substr(datetime, 1, 4) as yr, substr(datetime, 6, 2) as mon, MIN(daily) as min, MAX(daily) as max, MAX(monthly) as m_tot
FROM rainfall
WHERE substr(datetime, 1, 7) < '{c_ym}' and mon = '{c_mon}'
GROUP BY yr;"""
s_tm = time.time()
mn_rf = rfall.qry_pd(nx_q)
e_tm = time.time()
print(f"query took {(e_tm-s_tm):.4f} sec\n")
print(mn_rf)
A relatively simple query. And in the terminal I go the following.
(dbd-3.13) PS R:\learn\dashboard\utils> python db_charts.py
query took 0.0061 sec
yr mon min max m_tot
0 2014 03 3.0 54.0 174.25
1 2015 03 2.0 32.0 225.00
2 2016 03 0.5 34.0 203.25
3 2017 03 1.5 63.5 387.50
4 2018 03 0.3 36.3 145.40
5 2019 03 0.5 15.7 52.50
6 2020 03 0.8 21.6 103.30
7 2021 03 0.3 21.6 107.90
And, I know, for most if not all of those years the minimum daily rainfall would likely be \(0\). But, that’s pretty boring information.
I am thinking that query runs pretty quickly. At least with the current file/data size. Will check again when I get the rest of the historical data parsed and added to the database tables. But for now, I don’t think a new database table is called for. But before moving on to check out some plotting options, let’s add a new method to the Weather_db class to get the historical monthly data; as done above.
Bit of a naming hassle. Have method called get_history
. Didn’t feel likely renaming it.
def get_mon_hist(self, g_mn, do_df=True):
"""Get the total month's rainfall and minimum and maximum daily rainfall
for the specified month for all years in the rainfall table except the
current year (may be incomplete month of data).
params:
g_mn: month, string 'mm', for which to get data
do_df: return dataframe if True, list of tuples otherwise
returns:
see params above
"""
# get current year and month as yyyy.mm
c_ym = time.strftime("%Y.%m", time.localtime())
# current table name
c_tnm = self.tnms['rf_tnm']
nx_q = f"""SELECT substr(datetime, 1, 4) as yr, substr(datetime, 6, 2) as mon, MIN(daily) as min, MAX(daily) as max, MAX(monthly) as m_tot
FROM {c_tnm}
WHERE substr(datetime, 1, 7) < '{c_ym}' and mon = '{g_mn}'
GROUP BY yr;"""
if do_df:
mn_rf = self.qry_pd(nx_q)
else:
mn_rf = self.exec(nx_q)
return mn_rf
A bit of a refactor and quick test to make sure we are still getting the same data as before.
# Get the necessary data for some specific month
c_mon = '03'
s_tm = time.time()
mn_rf = rfall.get_mon_hist(c_mon)
e_tm = time.time()
print(f"query took {(e_tm-s_tm):.4f} sec\n")
print(mn_rf)
And, I assure you the dataframe output by the refactored code is identical to that shown above.
Charts
I was thinking about having a look at a stacked barchart, a basic line chart and a scatter plot. But, as the total month will, in most cases, be much larger than either the daily minimum or maximum, I have decided to look at a grouped barchart instead. Let’s start with the barchart.
Grouped Barchart
The code will be virtually identical to that of the stacked barchart in the previous post. Less one layout update, plus a title change or two.
if do_gbar:
# let's see what a grouped and/or stacked bar chart using that data looks like
fl_pth = d_fig/f"mon_{c_mon}_gbar.html"
fig = go.Figure()
# can't just pass dataframe, so going one row at a time
# chart labels for data columns
c_nms = ["Daily Min", "Daily Max", "Month Tot"]
d_cols = list(mn_rf)[-3:]
# add data to chart for each column we are interested in
for c_ndx, c_col in enumerate(["min", "max", "m_tot"]):
fig.add_trace(go.Bar(
x=mn_rf.loc[:, 'yr'],
y=list(mn_rf.loc[:, c_col]),
name=c_nms[c_ndx]
))
# the above generates a group bar chart, so convert to stacked
# fig.update_layout(barmode="stack")
fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",
margin=dict(l=0, r=0, b=0, t=40, pad=5),
title_text=f"Historical {l_mons[int(c_mon)-1]} Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Rainfall")),
)
fig.write_html(fl_pth)
# fig = rf_mon_sbar(rf_yr, sv_pth=fl_pth)
fig.show()
And here’s the actual chart (interactive). If you hover over a bar the \(x\) and \(y\) values and the label will be displayed. Also, if you click on an item in the legend it will be removed from the chart. Click again to display it. By removing the monthly total bar, we can actually see the daily minumum more clearly. Remove the daily maximum bar and the daily minimum bars fill the chart. Quite neat that.
Pretty much impossible to get the the cursor over the smallest of the minimum daily values with all bars displayed. But, that’s to be expected I guess. That said, it is a fairly reasonable chart given the intractive option to remove bars.
Line Chart
Okay, let’t look at plotting lines for the three data values. A lot of repetitive code. But I did create some new if
block control variables. I will need them for each chart type. So, made sense to do so. Will eventually refactor the previous grouped bar chart code.
... ...
if do_mn_bar:
do_gbar = False
do_line = True
do_sctr = False
... ...
c_nms = ["Daily Min", "Daily Max", "Month Tot"]
d_cols = list(mn_rf)[-3:]
x_pts = mn_rf.loc[:, 'yr']
... ...
if do_line:
# let's look at plotting a line across the years for the three values
fl_pth = d_fig/f"mon_{c_mon}_line.html"
fig = go.Figure()
for c_ndx, c_col in enumerate(["min", "max", "m_tot"]):
# convert enumerates index to a suitable index for the dataframe
fig.add_trace(go.Scatter(
x=x_pts,
y=list(mn_rf.loc[:, c_col]),
mode='lines+markers',
name=c_nms[c_ndx]
))
# the above generates a group bar chart, so convert to stacked
# fig.update_layout(barmode="stack")
fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",
margin=dict(l=0, r=0, b=0, t=40, pad=5),
title_text=f"Historical {l_mons[int(c_mon)-1]} Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Rainfall")),
)
fig.write_html(fl_pth)
# fig = rf_mon_sbar(rf_yr, sv_pth=fl_pth)
fig.show()
And, in the browser I got the following.
Certainly easier to hover over the various values. Including those really small daily minimums. But lets experiment a little more before making a decision. I am thinking a grouped scatter chart. Thought about a bubble chart, but not really going to work for this situation, not enough dimensions of data.
Grouped Scatter Plot
Again a lot duplicated code.
... ...
if do_mn_bar:
do_gbar = False
do_line = False
do_sctr = True
... ...
if do_sctr:
# let's look at plotting a grouped scatter plot across the years for the three values
fl_pth = d_fig/f"mon_{c_mon}_sctr.html"
fig = go.Figure()
for c_ndx, c_col in enumerate(["min", "max", "m_tot"]):
# convert enumerates index to a suitable index for the dataframe
fig.add_trace(go.Scatter(
x=x_pts,
y=list(mn_rf.loc[:, c_col]),
mode='markers',
name=c_nms[c_ndx]
))
# the above generates a group bar chart, so convert to stacked
fig.update_layout(scattermode="group")
fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",
margin=dict(l=0, r=0, b=0, t=40, pad=5),
title_text=f"Historical {l_mons[int(c_mon)-1]} Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Rainfall")),
)
fig.write_html(fl_pth)
fig.show()
And in a browser tab I got the following.
Not to sure which chart wins. And, there is enough similarity in the code that perhaps I could code a single function to do any of them based on a parameter?
Grouped Scatter Plot #2
Well, as I was searching for some information, I saw an example of a grouped scatter plot in a horizontal rather than vertical orientation. Thought it might be worth a look. Basically pretty much duplication of the code for the vertical orientation; but with a number of changes. (I got carried away with modifying markers, legends and such.)
That said I was running into a problem with the display. The minimum daily value circles were not displaying fully. So I decided to manually set the bounds for the x-axis. Unfortunately I could not set the lower bound by itself. So, I added a quickly coded function to return a value for the upper bound based on the maximum value to be plotted on the axis in question. Not well thought out, but here it is.
... ...
def get_u_bnd(mx_val):
"""Get upper bound for chart axis given mx_val for that access.
Max bound will be a muliple of 50
params:
mx_val: maximum value for the axis, float
returns:
upper bound, float
"""
if mx_val % 100 > 50:
mx_x = ((mx_val // 100) + 1) * 100
else:
mx_x = ((mx_val // 100) * 100) + 50
return mx_x
And the slightly refactored earlier scatter plot code follows. Still an issue with the x-axis bounds, so a wee fix.
I spent a lot of time playing with marker shapes, colours, and sizes. Then with legend orientation and position. Then grid lines. Too much time in fact. Any way, here’s the current code for this scatter plot style.
... ...
if do_mn_bar:
do_gbar = False
do_line = False
do_sctr_v = False
do_sctr_h = True
... ...
c_nms = ["Daily Minimum", "Daily Maximum", "Month Total"]
d_cols = list(mn_rf)[-3:]
x_pts = mn_rf.loc[:, 'yr']
yr_vals = mn_rf.loc[:, 'yr']
mrkr_shp = ["arrow-bar-up", "arrow-bar-down", "star"]
mrkr_sz = [10, 12, 16]
mrkr_clr = ["dodgerblue", "royalblue", "darkblue"]
mrkr_ln_clr = "DarkSlateGrey"
mrkr_ln_wd = 2
grid_clr = "lightblue"
grid_wd = 1
... ...
if do_sctr_v:
# let's look at plotting a line across the years for the three values
fl_pth = d_fig/f"mon_{c_mon}_sctr_v.html"
... ...
if do_sctr_h:
# let's look at a horizontal version of the grouped scatter plot
fl_pth = d_fig/f"mon_{c_mon}_sctr_h.html"
fig = go.Figure()
mx_mtot = mn_rf["m_tot"].max()
mx_x = get_u_bnd(mx_mtot) + 10 # still have issues, needs work
print(mx_mtot, mx_x)
for c_ndx, c_col in enumerate(["min", "max", "m_tot"]):
fig.add_trace(go.Scatter(
x=list(mn_rf.loc[:, c_col]),
y=yr_vals,
marker=dict(size=mrkr_sz[c_ndx], symbol=f"{mrkr_shp[c_ndx]}", color=f"{mrkr_clr[c_ndx]}",
line=dict(width=mrkr_ln_wd, color=f"{mrkr_ln_clr}")),
mode='markers',
name=c_nms[c_ndx]
))
# the above generates a group bar chart, so convert to stacked
fig.update_layout(scattermode="group")
# need to provide some extra room for the markers for the smallest and highest values
fig.update_xaxes(range=[-10, mx_x])
fig.update_yaxes(
showgrid=True, # Set to True to display grid lines
gridwidth=grid_wd, # Set the width of the grid lines
gridcolor=grid_clr # Set the color of the grid lines
)
# horizontal legend at bottom of chart
fig.update_layout(legend={'itemsizing': 'constant', 'orientation': 'h', 'y': -0.15, 'xanchor': 'right', 'x': 1})
fig.update_layout(autosize=False, width=640, height=480, plot_bgcolor = "#ffffff",
margin=dict(l=0, r=2, b=0, t=40, pad=5),
title_text=f"Historical {l_mons[int(c_mon)-1]} Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
xaxis=dict(title=dict(text="Rainfall")), yaxis=dict(title=dict(text="Year")),
)
fig.write_html(fl_pth)
fig.show()
And in a browser tab I got the following chart.
Finished for Now
Okay, think I am calling this post done. I like most of the charts, though I think the horizontal grouped scatter plot is the most effective for visualizing the data. However the grouped bar chart with the ability to remove bars also has some merit. And, I didn’t yet attempt to use larger or other symbols for the vertical grouped scatter plot. So, I am going to play around a little more before deciding what the new plotting function(s) will look like.
Until next time, may you enjoy playing with the many options available to you.
Resources
- Plotly Axes in Python
- Plotly colours list
- Plotly Discrete Colors in Python
- Plotly Legends in Python
- Plotly Scatter Plots in Python
- Plotly Styling Markers in Python
- Scatter Plots in Plotly