Extracting Useful Data from the Table

As I expect I am going to want to draw a gauge showing today’s temperature against the average, low and high from previous years, let’s start with that. But which temperature: the morning temperature or the most recent if there are other entries present for the day. And, what is the average temperature if there are multiple entries per day. I am for now going to only use the morning temperature. For years without low and high temperatures in the table, the morning temperature will be the low. If there are additional entries on any given days, those will become my highs.

See, no code and getting messy already. I figure this is all going to take multiple queries and a bit of logic and arithmetic to sort out. And, once I get this sorted I think I will be adding a new table so the work only gets done once. Well once after I have finished developing and testing the code and queries.

Some Stats

Count of Entries

Let’s start by getting a count of the entries in the the four numeric columns for each year and month. Not sure why, but I think it may eventually matter when deciding which temperature to use for any given day. And, whatever, it’s a stat.

... ...
t_nm = rfall.tnms["tp_tnm"]
... ...
# get count of entries for each year and month in three temperature columns
q_cnt = f"""SELECT substr(datetime, 1, 7) as c_month, COUNT(temperature) as temps,
COUNT(dmin) as mins, COUNT(dmax) as maxs, COUNT(humidity) as humidity
FROM {t_nm}
GROUP BY c_month;"""

rw_cnt = rfall.qry_pd(q_cnt)
print(rw_cnt.head(16))
print("\n", rw_cnt.tail(16))

The terminal output was as follows.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
    c_month  temps  mins  maxs  humidity
0   2015.01     29    25    22         0
1   2015.02     27    20    17         0
2   2015.03     25    22    17         0
3   2015.04     27    24    21         0
4   2015.05     24    22    19         0
5   2015.06     27    19    17         0
6   2015.07     20    20    15         0
7   2015.08     21    21    17         0
8   2015.09     19    19    16         0
9   2015.10     21    21    17         0
10  2015.11     25    24    19         0
11  2015.12     24    24    21         0
12  2016.01     26    26    18         0
13  2016.02     26    26    23         0
14  2016.03     25    25    23         0
15  2016.04     21    21    19         0

     c_month  temps  mins  maxs  humidity
113  2024.06     31     0     0        29
114  2024.07     54     0     0        54
115  2024.08     45     0     0        45
116  2024.09     33     0     0        33
117  2024.10     31     0     0        31
118  2024.11     29     0     0        29
119  2024.12     31     0     0        31
120  2025.01     33     0     0        31
121  2025.02     29     0     0        29
122  2025.03     31     0     0        31
123  2025.04     30     0     0        30
124  2025.05     31     0     0        31
125  2025.06     54     0     0        53
126  2025.07    143     0     0       143
127  2025.08    152     0     0       152
128  2025.09     40     0     0        40

As you can see, there are currently 129 months of data in the table. Though 2025.09 is not yet complete. Clearly, I was not consistent recording daily morning, minimum and maximum temperatures in the early years. With the odd hiccup between daily morning temperature and humidity in recent years. Though the latter may be a failing of my parsing code rather than my recording habits. And, in the last few months I have pretty clearly been recording significantly more temperature values per day. (Somewhat ridiculously many in some months.)

I don’t really want to show all the 129 returned rows. But, I would like to have a look at all the months for 2021 and 2022.

q_cnt = f"""SELECT substr(datetime, 1, 7) as c_month, COUNT(temperature) as temps,
COUNT(dmin) as mins, COUNT(dmax) as maxs, COUNT(humidity) as humidity
FROM {t_nm}
WHERE substr(datetime, 1, 4)>='2021' AND substr(datetime, 1, 4)<='2022'
GROUP BY c_month;"""

rw_cnt = rfall.qry_pd(q_cnt)
print(rw_cnt)

And…

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
    c_month  temps  mins  maxs  humidity
0   2021.01     63    30    55         0
1   2021.02     59    26    54         0
2   2021.03     63    30    59         0
3   2021.04     61    29    59         0
4   2021.05     61    29    58         0
5   2021.06     75    22    49         0
6   2021.07     76    24    58         0
7   2021.08     88    17    58         1
8   2021.09     55    15    28         8
9   2021.10     70     0     0         0
10  2021.11     44     0     0         0
11  2021.12     15     1     0         0
12  2022.01     29     0     0         1
13  2022.02     25     0     0         0
14  2022.03     27     0     0        12
15  2022.04     30     0     0        28
16  2022.05     26     0     0        26
17  2022.06     29     0     0        29
18  2022.07     39     0     0        32
19  2022.08     37     0     0        31
20  2022.09     28     0     0        28
21  2022.10     30     0     0        30
22  2022.11     33     0     0        29
23  2022.12     44     0     0        36

Sadly, I have no idea what that actually tells us. Time to move on.

Queries

Get Minimum Daily Temperature

What I am trying to get is the minimum daily temperature for a given month over all the years in the table. Similar to what we did for the historical monthly average temperature chart.

Let’s start with a single month and year. We have two minimums to look for: the daily morning temperature and the daily minimum temperature. Though one or the other may not exist (i.e. NULL in the database table). My plan is for each day to get the minimum of those two temperatures if they exist. Otherwise, just whatever temperature is available.

I will start with the daily morning temperature.

For display purposes, I am only getting one month’s worth of data. We will eventually cover all the dates in the table. I am using .to_string() so that the whole dataframe gets printed to the terminal. Otherwise Pandas defaults to displaying the first and last five rows.

# get min temp for each day in a month
t_yr = "2017"
t_mn = "08"
q_t_min = f"""SELECT SUBSTR(datetime, 1, 10) as date, SUBSTR(datetime, 12, 5) as time, MIN(temperature) as d_min
  FROM {t_nm}
  WHERE SUBSTR(datetime, 1, 7)='{t_yr}.{t_mn}'
  GROUP BY date"""
d_min = rfall.qry_pd(q_t_min, i_col="date")
print(f"Daily minimum temperatures for {t_yr}.{t_mn}")
print(d_min.to_string())

In the terminal…

Daily minimum morning temperatures for 2017.08
             time  d_min
date
2017.08.01  14:18   26.8
2017.08.02  08:59   20.4
2017.08.03  08:19   19.3
2017.08.04  08:29   19.3
2017.08.05  09:32   19.5
2017.08.06  10:14   20.9
2017.08.07  08:40   18.8
2017.08.09  09:49   21.4
2017.08.10  12:44   26.6
2017.08.11  09:51   21.1
2017.08.12  09:24   17.8
2017.08.13  09:32   17.3
2017.08.14  07:49   14.5
2017.08.15  08:44    5.6
2017.08.16  07:47   15.5
2017.08.17  09:11   17.5
2017.08.18  09:20   15.9
2017.08.19  09:38   17.0
2017.08.20  09:42   16.9
2017.08.21  08:38   16.9
2017.08.22  09:09   19.0
2017.08.23  08:41   17.5
2017.08.24  08:52   16.4
2017.08.25  08:36   15.2
2017.08.26  08:31   15.9
2017.08.27  09:21   20.0
2017.08.28  07:54   18.4
2017.08.29  09:58   23.0
2017.08.30  09:12   17.5
2017.08.31  13:47   22.6

Now, as mentioned, for some years there will also be daily minimums available. Something like 2015-2021. Though many days may not have that temperature. Certainly not those from 2022 onwards. Expect the query will be very similar. Let’s give it a go.

q_m_min = f"""SELECT SUBSTR(datetime, 1, 10) as date, SUBSTR(datetime, 12, 5) as time, MIN(dmin) as m_min
  FROM {t_nm}
  WHERE SUBSTR(datetime, 1, 7)='{t_yr}.{t_mn}' AND dmin IS NOT NULL
  GROUP BY date"""
print(f"Daily minimum temperatures for {t_yr}.{t_mn}")
m_min = rfall.qry_pd(q_m_min, i_col="date")
print(m_min.to_string())

And, in the terminal…

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Daily minimum temperatures for 2017.08
             time  m_min
date
2017.08.01  14:18   17.4
2017.08.02  08:59   18.1
2017.08.03  08:19   18.1
2017.08.04  08:29   18.2
2017.08.05  09:32   18.0
2017.08.06  10:14   16.4
2017.08.07  08:40   17.1
2017.08.09  09:49   17.5
2017.08.10  12:44   18.4
2017.08.11  09:51   18.3
2017.08.12  09:24   16.4
2017.08.13  09:32   16.3
2017.08.14  07:49   14.4
2017.08.15  08:44   13.7
2017.08.16  07:47   15.2
2017.08.17  09:11   15.4
2017.08.18  09:20   15.3
2017.08.19  09:38   15.6
2017.08.20  09:42   14.8
2017.08.21  08:38   15.0
2017.08.22  09:09   16.5
2017.08.23  08:41   16.6
2017.08.24  08:52   16.0
2017.08.25  08:36   13.7
2017.08.26  08:31   14.0
2017.08.27  09:21   15.9
2017.08.28  07:54   17.8
2017.08.29  09:58   18.4
2017.08.30  09:12   16.5
2017.08.31  13:47   16.9

And those two outputs certainly don’t match.

What I am planning to do next is create a UNION of those two queries. That should give me a table with the date, daily morning minimum and the daily absolute minimum. Let’s see what that takes to get done. But I now plan to do it for all available dates. So those preceding queries will need to change a bit. I will leave it to you to sort that out.

And it was relatively easy. To avoid the truncated Pandas dataframe output, I am using head and tail to get a bit more data at both ends of the resulting dataframe.

q_xj_mn = f"""SELECT q1.date, q1.d_min, q2.m_min
  FROM ({q_t_min}) AS q1
  LEFT JOIN ({q_m_min}) as q2
  ON q1.date = q2.date;"""
dly_min = rfall.qry_pd(q_xj_mn, i_col="date")
print("\n", dly_min.head(10))
print("\n", dly_min.tail(10))

And, the truncated output in the terminal was as follows.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Daily minimum temperatures

            d_min  m_min
date
2015.08.01   24.1   16.4
2015.08.02   23.8   17.5
2015.08.04   19.3   16.4
2015.08.05   15.4   15.0
2015.08.07   17.2   13.6
2015.08.08   18.7   15.7
2015.08.09   19.4   16.9
2015.08.10   18.0   16.0
2015.08.11   28.4   15.9
2015.08.12   21.9   17.4

            d_min  m_min
date
2025.08.22   13.5    NaN
2025.08.23   16.3    NaN
2025.08.24   16.5    NaN
2025.08.25   16.4    NaN
2025.08.26   16.1    NaN
2025.08.27   16.4    NaN
2025.08.28   18.3    NaN
2025.08.29   17.5    NaN
2025.08.30   15.6    NaN
2025.08.31   15.0    NaN

As expected no true daily minimum for the later years.

But, I want the minimum of the two values if both are present, or the single value otherwise. Let’s see if we can get the query to do that for us.

The problem I ran into was that the scalar MIN function returns NULL if it is present. Not exactly what I wanted. I found another scalar function that solved that problem, specifically coalesce.

The revised JOIN query now looks like the following.

q_xj_mn = f"""SELECT q1.date, q1.d_min, q2.m_min, 
  MIN(COALESCE(q1.d_min, 9223372036854775807),
      COALESCE(q2.m_min, 9223372036854775807)) AS a_min
  FROM ({q_t_min}) AS q1
  LEFT JOIN ({q_m_min}) as q2
  ON q1.date = q2.date;"""

After making the above change in the code higher up, the terminal output is now as follows.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Daily minimum temperatures

             d_min  m_min  a_min
date
2015.08.01   24.1   16.4   16.4
2015.08.02   23.8   17.5   17.5
2015.08.04   19.3   16.4   16.4
2015.08.05   15.4   15.0   15.0
2015.08.07   17.2   13.6   13.6
2015.08.08   18.7   15.7   15.7
2015.08.09   19.4   16.9   16.9
2015.08.10   18.0   16.0   16.0
2015.08.11   28.4   15.9   15.9
2015.08.12   21.9   17.4   17.4

             d_min  m_min  a_min
date
2025.08.22   13.5    NaN   13.5
2025.08.23   16.3    NaN   16.3
2025.08.24   16.5    NaN   16.5
2025.08.25   16.4    NaN   16.4
2025.08.26   16.1    NaN   16.1
2025.08.27   16.4    NaN   16.4
2025.08.28   18.3    NaN   18.3
2025.08.29   17.5    NaN   17.5
2025.08.30   15.6    NaN   15.6
2025.08.31   15.0    NaN   15.0

And that looks like what we are after. Well we don’t really need the first two columns any more.

Let’s move on to the other end of the scale.

Get Maximum Daily Temperature

I am going to pretty much just copy the refactored code for the minimum and modify it to work on maximum values for the applicable cases.

... ...
q_t_max = f"""SELECT SUBSTR(datetime, 1, 10) as date, MAX(temperature) as d_max
  FROM {t_nm}
  WHERE SUBSTR(datetime, 6, 2)='{t_mn}'
  GROUP BY date"""
q_m_max = f"""SELECT SUBSTR(datetime, 1, 10) as date, MAX(dmax) as m_max
  FROM {t_nm}
  WHERE SUBSTR(datetime, 6, 2)='{t_mn}' AND dmax IS NOT NULL
  GROUP BY date"""
q_xj_mx = f"""SELECT q1.date, q1.d_max, q2.m_max,
  MAX(COALESCE(q1.d_max, -9223372036854775807),
      COALESCE(q2.m_max, -9223372036854775807)) AS a_max
  FROM ({q_t_max}) AS q1
  LEFT JOIN ({q_m_max}) as q2
  ON q1.date = q2.date;"""
dly_max = rfall.qry_pd(q_xj_mx, i_col="date")
print("Daily maximum temperature")
print("\n", dly_max.head(10))
print("\n", dly_max.tail(10))

In the terminal:

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Daily maximum temperature

             d_max  m_max  a_max
date
2015.08.01   24.1   30.6   30.6
2015.08.02   23.8   29.8   29.8
2015.08.04   19.3   25.6   25.6
2015.08.05   15.4   20.2   20.2
2015.08.07   17.2   26.1   26.1
2015.08.08   18.7   22.2   22.2
2015.08.09   19.4   25.4   25.4
2015.08.10   18.0    NaN   18.0
2015.08.11   28.4   30.0   30.0
2015.08.12   21.9   29.0   29.0

             d_max  m_max  a_max
date
2025.08.22   28.0    NaN   28.0
2025.08.23   30.0    NaN   30.0
2025.08.24   30.0    NaN   30.0
2025.08.25   29.0    NaN   29.0
2025.08.26   29.0    NaN   29.0
2025.08.27   27.0    NaN   27.0
2025.08.28   24.0    NaN   24.0
2025.08.29   27.0    NaN   27.0
2025.08.30   24.0    NaN   24.0
2025.08.31   24.0    NaN   24.0

And, that also seems to work.

To be safe I checked that there were no really bad values in either dataframe. I.E.

... ...
min_bad = dly_min[dly_min["a_min"] > 500]
print(f"\n{min_bad}")
... ...
max_bad = dly_max[dly_max["a_max"] < -500]
print(f"\n{max_bad}")

In both cases, I got an empty dataframe. So a little more prove the queries seem to work as desired.

Get the Min and Max for Each Month

Okay, my original intent here was to come up with minimum and maximum temperature for each month for all the years in the table. Once again, as done for the monthly temperatures earlier in this series of posts.

In earlier playing around, I got those values from the dataframe returned by the above queries. But, I have now decided to try and get the queries to do that work. And, pretty easy it was.

... ...
 print(f"Min and max temps for {t_mn}")
... ...
q_xj_mn = f"""SELECT SUBSTR(date, 1, 4) AS year, MIN(a_min) AS min_temp
FROM
(SELECT q1.date, q1.d_min, q2.m_min, 
  MIN(COALESCE(q1.d_min, 9223372036854775807),
      COALESCE(q2.m_min, 9223372036854775807)) AS a_min
  FROM ({q_t_min}) AS q1
  LEFT JOIN ({q_m_min}) as q2
  ON q1.date = q2.date)
GROUP BY year;"""
yr_min = rfall.qry_pd(q_xj_mn, i_col="year")
... ...
q_xj_mx = f"""SELECT SUBSTR(date, 1, 4) AS year, MAX(a_max) AS max_temp
  FROM
  (SELECT q1.date, q1.d_max, q2.m_max,
  MAX(COALESCE(q1.d_max, -9223372036854775807),
      COALESCE(q2.m_max, -9223372036854775807)) AS a_max
  FROM ({q_t_max}) AS q1
  LEFT JOIN ({q_m_max}) as q2
  ON q1.date = q2.date)
  GROUP BY year;"""
yr_max = rfall.qry_pd(q_xj_mx, i_col="year")

yr_mn_mx = pd.concat([yr_min, yr_max], axis=1, ignore_index=False)
print("\n", yr_mn_mx)

And in the terminal I got the following.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Min and max temps for 08

       min_temp  max_temp
year
2015      12.0      30.6
2016      12.2      32.7
2017       5.6      30.4
2018      11.6      31.6
2019      13.0      29.1
2020      12.0      31.5
2021      11.6      33.2
2022      14.2      31.0
2023      13.0      30.0
2024      11.0      27.0
2025      13.4      30.0

I have made no attempt to verify the above values. But they do look entirely plausible.

Just for fun let’s try another month, say January.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Min and max temps for 01

       min_temp  max_temp
year
2015      -3.6      15.1
2016      -3.3      11.7
2017      -6.7      10.0
2018      -2.2      10.4
2019      -0.5      10.8
2020     -11.0      13.6
2021      -1.9      12.1
2022      -5.0       9.6
2023      -3.0       9.0
2024     -13.0      11.0
2025      -2.0       6.0

Again, no verification, but still looks plausible.

Now, given the data I have, I really don’t know what the average monthly temperature is or how to calculate it in a statistically meaningful way. So, I am just going to, for each month, use the average of the minimum and maximum temperatures to estimate the average daily temperature for that month and year.

... ...
yr_mn_mx["avg_temp"] = yr_mn_mx[["min_temp", "max_temp"]].mean(axis=1)
print("\n", yr_mn_mx)

And…

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Min and max temps for 01

       min_temp  max_temp  avg_temp
year
2015      -3.6      15.1      5.75
2016      -3.3      11.7      4.20
2017      -6.7      10.0      1.65
2018      -2.2      10.4      4.10
2019      -0.5      10.8      5.15
2020     -11.0      13.6      1.30
2021      -1.9      12.1      5.10
2022      -5.0       9.6      2.30
2023      -3.0       9.0      3.00
2024     -13.0      11.0     -1.00
2025      -2.0       6.0      2.00

Note: I forgot to have the queries exclude the current month’s data if necessary. The current month’s data would likely not be complete and shouldn’t be included in a comparison to itself. I’ll leave the changes to the queries for you sort yourselves.

Done M’thinks

Well, lots of messing around, a fair bit of code (incremental development), and a probably successful result. Adding a new method yet to be done. Also considering saving the data to a table so don’t need to calculate it every time the dashboard is started.

And, still trying to figure out what other temperature or weather related data I might want to display on the dashboard. Looking at a way to use the weather condition data, but having some serious difficulty with that statistic. For example, what would the average condition be for a given day or month? Not like I can take the mean of some number of condition values (strings).

Until next time, may you get to where you wanted to go or at least close as is resonably possible.

Resources