As mentioned, I expect I will also want to be able to display some historical data for a given month. Something like the month’s total rainfall along with the minimum and maximum daily rainfall for some number of years. For the historical yearly data I was pretty sure I’d be using either a heatmap or stacked barchart. For this case I am not too sure what the best choice might be.

Also, if I were expecting to be running the dashboard on some server on a continuous basis, I would likely generate the data for the charts as needed. However, I expect I will be shutting the dashboard down each night and starting it up each morning. So, instead of calculating the data and generating the chart once a month, I will be doing it more or less daily. (Unless, though unlikely, I get it up and running in the cloud.)

So I am thinking another database table might be in order. Time will tell.

Let’s get to some developmental code. Once again in the db_charts.py module. There will, I am sure, eventually be a new method in the Weather_db class and a couple new functions in the charts module.

Historical Monthly Data for a Specific Month

Pretty clearly we need the data we want before we can plot anything. For now, I want the monthly total, daily minimum and daily maximum for a specific month for 10 years. At least at the moment I am thinking 10 years. Of course, currently don’t have that many.

Query

Let’s see if we can sort out the query to get the data we are after. For now, will look at getting data for March 2014-2021.

... ...
  if do_g_func:
    do_mn_2dt = False
    do_yr_2dt = False
    do_yr_bar = False
    do_mn_bar = True
... ...
    if do_mn_bar:

      do_gbar = True
      do_line = False
      do_sctr = False

      # Get the necessary data for some specific month
      c_mon = '03'

      # do not get any data for current month
      nx_q = f"""SELECT substr(datetime, 1, 4) as yr, substr(datetime, 6, 2) as mon, MIN(daily) as min, MAX(daily) as max, MAX(monthly) as m_tot
        FROM rainfall
        WHERE substr(datetime, 1, 7) < '{c_ym}' and mon = '{c_mon}'
        GROUP BY yr;"""
      s_tm = time.time()
      mn_rf = rfall.qry_pd(nx_q)
      e_tm = time.time()
      print(f"query took {(e_tm-s_tm):.4f} sec\n")
      print(mn_rf)

A relatively simple query. And in the terminal I go the following.

(dbd-3.13) PS R:\learn\dashboard\utils> python db_charts.py
query took 0.0061 sec

     yr mon  min   max  m_tot
0  2014  03  3.0  54.0  174.25
1  2015  03  2.0  32.0  225.00
2  2016  03  0.5  34.0  203.25
3  2017  03  1.5  63.5  387.50
4  2018  03  0.3  36.3  145.40
5  2019  03  0.5  15.7   52.50
6  2020  03  0.8  21.6  103.30
7  2021  03  0.3  21.6  107.90

And, I know, for most if not all of those years the minimum daily rainfall would likely be \(0\). But, that’s pretty boring information.

I am thinking that query runs pretty quickly. At least with the current file/data size. Will check again when I get the rest of the historical data parsed and added to the database tables. But for now, I don’t think a new database table is called for. But before moving on to check out some plotting options, let’s add a new method to the Weather_db class to get the historical monthly data; as done above.

Bit of a naming hassle. Have method called get_history. Didn’t feel likely renaming it.

  def get_mon_hist(self, g_mn, do_df=True):
    """Get the total month's rainfall and minimum and maximum daily rainfall
       for the specified month for all years in the rainfall table except the
       current year (may be incomplete month of data).

      params:
        g_mn: month, string 'mm', for which to get data
        do_df: return dataframe if True, list of tuples otherwise

      returns:
        see params above
    """
    # get current year and month as yyyy.mm
    c_ym = time.strftime("%Y.%m", time.localtime())
    # current table name
    c_tnm = self.tnms['rf_tnm']
    nx_q = f"""SELECT substr(datetime, 1, 4) as yr, substr(datetime, 6, 2) as mon, MIN(daily) as min, MAX(daily) as max, MAX(monthly) as m_tot
      FROM {c_tnm}
      WHERE substr(datetime, 1, 7) < '{c_ym}' and mon = '{g_mn}'
      GROUP BY yr;"""
    if do_df:
      mn_rf = self.qry_pd(nx_q)
    else:
      mn_rf = self.exec(nx_q)
    return mn_rf

A bit of a refactor and quick test to make sure we are still getting the same data as before.

      # Get the necessary data for some specific month
      c_mon = '03'
      s_tm = time.time()
      mn_rf = rfall.get_mon_hist(c_mon)
      e_tm = time.time()
      print(f"query took {(e_tm-s_tm):.4f} sec\n")
      print(mn_rf)

And, I assure you the dataframe output by the refactored code is identical to that shown above.

Charts

I was thinking about having a look at a stacked barchart, a basic line chart and a scatter plot. But, as the total month will, in most cases, be much larger than either the daily minimum or maximum, I have decided to look at a grouped barchart instead. Let’s start with the barchart.

Grouped Barchart

The code will be virtually identical to that of the stacked barchart in the previous post. Less one layout update, plus a title change or two.

      if do_gbar:
        # let's see what a grouped and/or stacked bar chart using that data looks like
        fl_pth = d_fig/f"mon_{c_mon}_gbar.html"
        fig = go.Figure()
        # can't just pass dataframe, so going one row at a time
        # chart labels for data columns
        c_nms = ["Daily Min", "Daily Max", "Month Tot"]
        d_cols = list(mn_rf)[-3:]
        
        # add data to chart for each column we are interested in
        for c_ndx, c_col in enumerate(["min", "max", "m_tot"]):
          fig.add_trace(go.Bar(
            x=mn_rf.loc[:, 'yr'],
            y=list(mn_rf.loc[:, c_col]),
            name=c_nms[c_ndx]
          ))
        # the above generates a group bar chart, so convert to stacked
        # fig.update_layout(barmode="stack")
        fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",                          
          margin=dict(l=0, r=0, b=0, t=40, pad=5),
          title_text=f"Historical {l_mons[int(c_mon)-1]} Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
          xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Rainfall")),
        )
        fig.write_html(fl_pth)
        # fig = rf_mon_sbar(rf_yr, sv_pth=fl_pth)
        fig.show()

And here’s the actual chart (interactive). If you hover over a bar the \(x\) and \(y\) values and the label will be displayed. Also, if you click on an item in the legend it will be removed from the chart. Click again to display it. By removing the monthly total bar, we can actually see the daily minumum more clearly. Remove the daily maximum bar and the daily minimum bars fill the chart. Quite neat that.

Pretty much impossible to get the the cursor over the smallest of the minimum daily values with all bars displayed. But, that’s to be expected I guess. That said, it is a fairly reasonable chart given the intractive option to remove bars.

Line Chart

Okay, let’t look at plotting lines for the three data values. A lot of repetitive code. But I did create some new if block control variables. I will need them for each chart type. So, made sense to do so. Will eventually refactor the previous grouped bar chart code.

... ...
    if do_mn_bar:

      do_gbar = False
      do_line = True
      do_sctr = False
... ...
      c_nms = ["Daily Min", "Daily Max", "Month Tot"]
      d_cols = list(mn_rf)[-3:]
      x_pts = mn_rf.loc[:, 'yr']
... ...
      if do_line:
        # let's look at plotting a line across the years for the three values
        fl_pth = d_fig/f"mon_{c_mon}_line.html"
        fig = go.Figure()
        for c_ndx, c_col in enumerate(["min", "max", "m_tot"]):
          # convert enumerates index to a suitable index for the dataframe
          fig.add_trace(go.Scatter(
            x=x_pts,
            y=list(mn_rf.loc[:, c_col]),
            mode='lines+markers',
            name=c_nms[c_ndx]
          ))
        # the above generates a group bar chart, so convert to stacked
        # fig.update_layout(barmode="stack")
        fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",                          
          margin=dict(l=0, r=0, b=0, t=40, pad=5),
          title_text=f"Historical {l_mons[int(c_mon)-1]} Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
          xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Rainfall")),
        )
        fig.write_html(fl_pth)
        # fig = rf_mon_sbar(rf_yr, sv_pth=fl_pth)
        fig.show()

And, in the browser I got the following.

Certainly easier to hover over the various values. Including those really small daily minimums. But lets experiment a little more before making a decision. I am thinking a grouped scatter chart. Thought about a bubble chart, but not really going to work for this situation, not enough dimensions of data.

Grouped Scatter Plot

Again a lot duplicated code.

... ...
    if do_mn_bar:

      do_gbar = False
      do_line = False
      do_sctr = True
... ...
      if do_sctr:
        # let's look at plotting a grouped scatter plot across the years for the three values
        fl_pth = d_fig/f"mon_{c_mon}_sctr.html"
        fig = go.Figure()
        for c_ndx, c_col in enumerate(["min", "max", "m_tot"]):
          # convert enumerates index to a suitable index for the dataframe
          fig.add_trace(go.Scatter(
            x=x_pts,
            y=list(mn_rf.loc[:, c_col]),
            mode='markers',
            name=c_nms[c_ndx]
          ))
        # the above generates a group bar chart, so convert to stacked
        fig.update_layout(scattermode="group")
        fig.update_layout(autosize=False, width=600, height=640, plot_bgcolor = "#ffffff",                          
          margin=dict(l=0, r=0, b=0, t=40, pad=5),
          title_text=f"Historical {l_mons[int(c_mon)-1]} Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
          xaxis=dict(title=dict(text="Year")), yaxis=dict(title=dict(text="Rainfall")),
        )
        fig.write_html(fl_pth)
        fig.show()

And in a browser tab I got the following.

Not to sure which chart wins. And, there is enough similarity in the code that perhaps I could code a single function to do any of them based on a parameter?

Grouped Scatter Plot #2

Well, as I was searching for some information, I saw an example of a grouped scatter plot in a horizontal rather than vertical orientation. Thought it might be worth a look. Basically pretty much duplication of the code for the vertical orientation; but with a number of changes. (I got carried away with modifying markers, legends and such.)

That said I was running into a problem with the display. The minimum daily value circles were not displaying fully. So I decided to manually set the bounds for the x-axis. Unfortunately I could not set the lower bound by itself. So, I added a quickly coded function to return a value for the upper bound based on the maximum value to be plotted on the axis in question. Not well thought out, but here it is.

... ...
def get_u_bnd(mx_val):
  """Get upper bound for chart axis given mx_val for that access.
     Max bound will be a muliple of 50
  
    params:
      mx_val: maximum value for the axis, float
    
    returns:
      upper bound, float
  """
  if mx_val % 100 > 50:
    mx_x = ((mx_val // 100) + 1) * 100
  else:
    mx_x = ((mx_val // 100) * 100) + 50
  return mx_x

And the slightly refactored earlier scatter plot code follows. Still an issue with the x-axis bounds, so a wee fix.

I spent a lot of time playing with marker shapes, colours, and sizes. Then with legend orientation and position. Then grid lines. Too much time in fact. Any way, here’s the current code for this scatter plot style.

... ...
    if do_mn_bar:

      do_gbar = False
      do_line = False
      do_sctr_v = False
      do_sctr_h = True
... ...
      c_nms = ["Daily Minimum", "Daily Maximum", "Month Total"]
      d_cols = list(mn_rf)[-3:]
      x_pts = mn_rf.loc[:, 'yr']
      yr_vals = mn_rf.loc[:, 'yr']
      mrkr_shp = ["arrow-bar-up", "arrow-bar-down", "star"]
      mrkr_sz = [10, 12, 16]
      mrkr_clr = ["dodgerblue", "royalblue", "darkblue"]
      mrkr_ln_clr = "DarkSlateGrey"
      mrkr_ln_wd = 2
      grid_clr = "lightblue"
      grid_wd = 1
... ...
      if do_sctr_v:
        # let's look at plotting a line across the years for the three values
        fl_pth = d_fig/f"mon_{c_mon}_sctr_v.html"
... ...
      if do_sctr_h:
        # let's look at a horizontal version of the grouped scatter plot
        fl_pth = d_fig/f"mon_{c_mon}_sctr_h.html"
        fig = go.Figure()
        mx_mtot = mn_rf["m_tot"].max()
        mx_x = get_u_bnd(mx_mtot) + 10 # still have issues, needs work
        print(mx_mtot, mx_x)
        for c_ndx, c_col in enumerate(["min", "max", "m_tot"]):
          fig.add_trace(go.Scatter(
            x=list(mn_rf.loc[:, c_col]),
            y=yr_vals,
            marker=dict(size=mrkr_sz[c_ndx], symbol=f"{mrkr_shp[c_ndx]}", color=f"{mrkr_clr[c_ndx]}",
                     line=dict(width=mrkr_ln_wd, color=f"{mrkr_ln_clr}")),
            mode='markers',
            name=c_nms[c_ndx]
          ))
        # the above generates a group bar chart, so convert to stacked
        fig.update_layout(scattermode="group")
        # need to provide some extra room for the markers for the smallest and highest values
        fig.update_xaxes(range=[-10, mx_x])
        fig.update_yaxes(
          showgrid=True,  # Set to True to display grid lines
          gridwidth=grid_wd,    # Set the width of the grid lines
          gridcolor=grid_clr # Set the color of the grid lines
        )
        # horizontal legend at bottom of chart
        fig.update_layout(legend={'itemsizing': 'constant', 'orientation': 'h', 'y': -0.15, 'xanchor': 'right', 'x': 1})
        fig.update_layout(autosize=False, width=640, height=480, plot_bgcolor = "#ffffff",                          
          margin=dict(l=0, r=2, b=0, t=40, pad=5),
          title_text=f"Historical {l_mons[int(c_mon)-1]} Rainfall", title_font_size=25, title_x=0.5, title_y=0.97,
          xaxis=dict(title=dict(text="Rainfall")), yaxis=dict(title=dict(text="Year")),
        )
        fig.write_html(fl_pth)
        fig.show()

And in a browser tab I got the following chart.

Finished for Now

Okay, think I am calling this post done. I like most of the charts, though I think the horizontal grouped scatter plot is the most effective for visualizing the data. However the grouped bar chart with the ability to remove bars also has some merit. And, I didn’t yet attempt to use larger or other symbols for the vertical grouped scatter plot. So, I am going to play around a little more before deciding what the new plotting function(s) will look like.

Until next time, may you enjoy playing with the many options available to you.

Resources