Extracting Useful Data from the Table
As I expect I am going to want to draw a gauge showing today’s temperature against the average, low and high from previous years, let’s start with that. But which temperature: the morning temperature or the most recent if there are other entries present for the day. And, what is the average temperature if there are multiple entries per day. I am for now going to only use the morning temperature. For years without low and high temperatures in the table, the morning temperature will be the low. If there are additional entries on any given days, those will become my highs.
See, no code and getting messy already. I figure this is all going to take multiple queries and a bit of logic and arithmetic to sort out. And, once I get this sorted I think I will be adding a new table so the work only gets done once. Well once after I have finished developing and testing the code and queries.
Some Stats
Count of Entries
Let’s start by getting a count of the entries in the the four numeric columns for each year and month. Not sure why, but I think it may eventually matter when deciding which temperature to use for any given day. And, whatever, it’s a stat.
... ...
t_nm = rfall.tnms["tp_tnm"]
... ...
# get count of entries for each year and month in three temperature columns
q_cnt = f"""SELECT substr(datetime, 1, 7) as c_month, COUNT(temperature) as temps,
COUNT(dmin) as mins, COUNT(dmax) as maxs, COUNT(humidity) as humidity
FROM {t_nm}
GROUP BY c_month;"""
rw_cnt = rfall.qry_pd(q_cnt)
print(rw_cnt.head(16))
print("\n", rw_cnt.tail(16))
The terminal output was as follows.
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
c_month temps mins maxs humidity
0 2015.01 29 25 22 0
1 2015.02 27 20 17 0
2 2015.03 25 22 17 0
3 2015.04 27 24 21 0
4 2015.05 24 22 19 0
5 2015.06 27 19 17 0
6 2015.07 20 20 15 0
7 2015.08 21 21 17 0
8 2015.09 19 19 16 0
9 2015.10 21 21 17 0
10 2015.11 25 24 19 0
11 2015.12 24 24 21 0
12 2016.01 26 26 18 0
13 2016.02 26 26 23 0
14 2016.03 25 25 23 0
15 2016.04 21 21 19 0
c_month temps mins maxs humidity
113 2024.06 31 0 0 29
114 2024.07 54 0 0 54
115 2024.08 45 0 0 45
116 2024.09 33 0 0 33
117 2024.10 31 0 0 31
118 2024.11 29 0 0 29
119 2024.12 31 0 0 31
120 2025.01 33 0 0 31
121 2025.02 29 0 0 29
122 2025.03 31 0 0 31
123 2025.04 30 0 0 30
124 2025.05 31 0 0 31
125 2025.06 54 0 0 53
126 2025.07 143 0 0 143
127 2025.08 152 0 0 152
128 2025.09 40 0 0 40
As you can see, there are currently 129 months of data in the table. Though 2025.09 is not yet complete. Clearly, I was not consistent recording daily morning, minimum and maximum temperatures in the early years. With the odd hiccup between daily morning temperature and humidity in recent years. Though the latter may be a failing of my parsing code rather than my recording habits. And, in the last few months I have pretty clearly been recording significantly more temperature values per day. (Somewhat ridiculously many in some months.)
I don’t really want to show all the 129 returned rows. But, I would like to have a look at all the months for 2021 and 2022.
q_cnt = f"""SELECT substr(datetime, 1, 7) as c_month, COUNT(temperature) as temps,
COUNT(dmin) as mins, COUNT(dmax) as maxs, COUNT(humidity) as humidity
FROM {t_nm}
WHERE substr(datetime, 1, 4)>='2021' AND substr(datetime, 1, 4)<='2022'
GROUP BY c_month;"""
rw_cnt = rfall.qry_pd(q_cnt)
print(rw_cnt)
And…
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
c_month temps mins maxs humidity
0 2021.01 63 30 55 0
1 2021.02 59 26 54 0
2 2021.03 63 30 59 0
3 2021.04 61 29 59 0
4 2021.05 61 29 58 0
5 2021.06 75 22 49 0
6 2021.07 76 24 58 0
7 2021.08 88 17 58 1
8 2021.09 55 15 28 8
9 2021.10 70 0 0 0
10 2021.11 44 0 0 0
11 2021.12 15 1 0 0
12 2022.01 29 0 0 1
13 2022.02 25 0 0 0
14 2022.03 27 0 0 12
15 2022.04 30 0 0 28
16 2022.05 26 0 0 26
17 2022.06 29 0 0 29
18 2022.07 39 0 0 32
19 2022.08 37 0 0 31
20 2022.09 28 0 0 28
21 2022.10 30 0 0 30
22 2022.11 33 0 0 29
23 2022.12 44 0 0 36
Sadly, I have no idea what that actually tells us. Time to move on.
Queries
Get Minimum Daily Temperature
What I am trying to get is the minimum daily temperature for a given month over all the years in the table. Similar to what we did for the historical monthly average temperature chart.
Let’s start with a single month and year. We have two minimums to look for: the daily morning temperature and the daily minimum temperature. Though one or the other may not exist (i.e. NULL in the database table). My plan is for each day to get the minimum of those two temperatures if they exist. Otherwise, just whatever temperature is available.
I will start with the daily morning temperature.
For display purposes, I am only getting one month’s worth of data. We will eventually cover all the dates in the table. I am using .to_string() so that the whole dataframe gets printed to the terminal. Otherwise Pandas defaults to displaying the first and last five rows.
# get min temp for each day in a month
t_yr = "2017"
t_mn = "08"
q_t_min = f"""SELECT SUBSTR(datetime, 1, 10) as date, SUBSTR(datetime, 12, 5) as time, MIN(temperature) as d_min
FROM {t_nm}
WHERE SUBSTR(datetime, 1, 7)='{t_yr}.{t_mn}'
GROUP BY date"""
d_min = rfall.qry_pd(q_t_min, i_col="date")
print(f"Daily minimum temperatures for {t_yr}.{t_mn}")
print(d_min.to_string())
In the terminal…
Daily minimum morning temperatures for 2017.08
time d_min
date
2017.08.01 14:18 26.8
2017.08.02 08:59 20.4
2017.08.03 08:19 19.3
2017.08.04 08:29 19.3
2017.08.05 09:32 19.5
2017.08.06 10:14 20.9
2017.08.07 08:40 18.8
2017.08.09 09:49 21.4
2017.08.10 12:44 26.6
2017.08.11 09:51 21.1
2017.08.12 09:24 17.8
2017.08.13 09:32 17.3
2017.08.14 07:49 14.5
2017.08.15 08:44 5.6
2017.08.16 07:47 15.5
2017.08.17 09:11 17.5
2017.08.18 09:20 15.9
2017.08.19 09:38 17.0
2017.08.20 09:42 16.9
2017.08.21 08:38 16.9
2017.08.22 09:09 19.0
2017.08.23 08:41 17.5
2017.08.24 08:52 16.4
2017.08.25 08:36 15.2
2017.08.26 08:31 15.9
2017.08.27 09:21 20.0
2017.08.28 07:54 18.4
2017.08.29 09:58 23.0
2017.08.30 09:12 17.5
2017.08.31 13:47 22.6
Now, as mentioned, for some years there will also be daily minimums available. Something like 2015-2021. Though many days may not have that temperature. Certainly not those from 2022 onwards. Expect the query will be very similar. Let’s give it a go.
q_m_min = f"""SELECT SUBSTR(datetime, 1, 10) as date, SUBSTR(datetime, 12, 5) as time, MIN(dmin) as m_min
FROM {t_nm}
WHERE SUBSTR(datetime, 1, 7)='{t_yr}.{t_mn}' AND dmin IS NOT NULL
GROUP BY date"""
print(f"Daily minimum temperatures for {t_yr}.{t_mn}")
m_min = rfall.qry_pd(q_m_min, i_col="date")
print(m_min.to_string())
And, in the terminal…
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Daily minimum temperatures for 2017.08
time m_min
date
2017.08.01 14:18 17.4
2017.08.02 08:59 18.1
2017.08.03 08:19 18.1
2017.08.04 08:29 18.2
2017.08.05 09:32 18.0
2017.08.06 10:14 16.4
2017.08.07 08:40 17.1
2017.08.09 09:49 17.5
2017.08.10 12:44 18.4
2017.08.11 09:51 18.3
2017.08.12 09:24 16.4
2017.08.13 09:32 16.3
2017.08.14 07:49 14.4
2017.08.15 08:44 13.7
2017.08.16 07:47 15.2
2017.08.17 09:11 15.4
2017.08.18 09:20 15.3
2017.08.19 09:38 15.6
2017.08.20 09:42 14.8
2017.08.21 08:38 15.0
2017.08.22 09:09 16.5
2017.08.23 08:41 16.6
2017.08.24 08:52 16.0
2017.08.25 08:36 13.7
2017.08.26 08:31 14.0
2017.08.27 09:21 15.9
2017.08.28 07:54 17.8
2017.08.29 09:58 18.4
2017.08.30 09:12 16.5
2017.08.31 13:47 16.9
And those two outputs certainly don’t match.
What I am planning to do next is create a UNION of those two queries. That should give me a table with the date, daily morning minimum and the daily absolute minimum. Let’s see what that takes to get done. But I now plan to do it for all available dates. So those preceding queries will need to change a bit. I will leave it to you to sort that out.
And it was relatively easy. To avoid the truncated Pandas dataframe output, I am using head and tail to get a bit more data at both ends of the resulting dataframe.
q_xj_mn = f"""SELECT q1.date, q1.d_min, q2.m_min
FROM ({q_t_min}) AS q1
LEFT JOIN ({q_m_min}) as q2
ON q1.date = q2.date;"""
dly_min = rfall.qry_pd(q_xj_mn, i_col="date")
print("\n", dly_min.head(10))
print("\n", dly_min.tail(10))
And, the truncated output in the terminal was as follows.
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Daily minimum temperatures
d_min m_min
date
2015.08.01 24.1 16.4
2015.08.02 23.8 17.5
2015.08.04 19.3 16.4
2015.08.05 15.4 15.0
2015.08.07 17.2 13.6
2015.08.08 18.7 15.7
2015.08.09 19.4 16.9
2015.08.10 18.0 16.0
2015.08.11 28.4 15.9
2015.08.12 21.9 17.4
d_min m_min
date
2025.08.22 13.5 NaN
2025.08.23 16.3 NaN
2025.08.24 16.5 NaN
2025.08.25 16.4 NaN
2025.08.26 16.1 NaN
2025.08.27 16.4 NaN
2025.08.28 18.3 NaN
2025.08.29 17.5 NaN
2025.08.30 15.6 NaN
2025.08.31 15.0 NaN
As expected no true daily minimum for the later years.
But, I want the minimum of the two values if both are present, or the single value otherwise. Let’s see if we can get the query to do that for us.
The problem I ran into was that the scalar MIN function returns NULL if it is present. Not exactly what I wanted. I found another scalar function that solved that problem, specifically coalesce.
The revised JOIN query now looks like the following.
q_xj_mn = f"""SELECT q1.date, q1.d_min, q2.m_min,
MIN(COALESCE(q1.d_min, 9223372036854775807),
COALESCE(q2.m_min, 9223372036854775807)) AS a_min
FROM ({q_t_min}) AS q1
LEFT JOIN ({q_m_min}) as q2
ON q1.date = q2.date;"""
After making the above change in the code higher up, the terminal output is now as follows.
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Daily minimum temperatures
d_min m_min a_min
date
2015.08.01 24.1 16.4 16.4
2015.08.02 23.8 17.5 17.5
2015.08.04 19.3 16.4 16.4
2015.08.05 15.4 15.0 15.0
2015.08.07 17.2 13.6 13.6
2015.08.08 18.7 15.7 15.7
2015.08.09 19.4 16.9 16.9
2015.08.10 18.0 16.0 16.0
2015.08.11 28.4 15.9 15.9
2015.08.12 21.9 17.4 17.4
d_min m_min a_min
date
2025.08.22 13.5 NaN 13.5
2025.08.23 16.3 NaN 16.3
2025.08.24 16.5 NaN 16.5
2025.08.25 16.4 NaN 16.4
2025.08.26 16.1 NaN 16.1
2025.08.27 16.4 NaN 16.4
2025.08.28 18.3 NaN 18.3
2025.08.29 17.5 NaN 17.5
2025.08.30 15.6 NaN 15.6
2025.08.31 15.0 NaN 15.0
And that looks like what we are after. Well we don’t really need the first two columns any more.
Let’s move on to the other end of the scale.
Get Maximum Daily Temperature
I am going to pretty much just copy the refactored code for the minimum and modify it to work on maximum values for the applicable cases.
... ...
q_t_max = f"""SELECT SUBSTR(datetime, 1, 10) as date, MAX(temperature) as d_max
FROM {t_nm}
WHERE SUBSTR(datetime, 6, 2)='{t_mn}'
GROUP BY date"""
q_m_max = f"""SELECT SUBSTR(datetime, 1, 10) as date, MAX(dmax) as m_max
FROM {t_nm}
WHERE SUBSTR(datetime, 6, 2)='{t_mn}' AND dmax IS NOT NULL
GROUP BY date"""
q_xj_mx = f"""SELECT q1.date, q1.d_max, q2.m_max,
MAX(COALESCE(q1.d_max, -9223372036854775807),
COALESCE(q2.m_max, -9223372036854775807)) AS a_max
FROM ({q_t_max}) AS q1
LEFT JOIN ({q_m_max}) as q2
ON q1.date = q2.date;"""
dly_max = rfall.qry_pd(q_xj_mx, i_col="date")
print("Daily maximum temperature")
print("\n", dly_max.head(10))
print("\n", dly_max.tail(10))
In the terminal:
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Daily maximum temperature
d_max m_max a_max
date
2015.08.01 24.1 30.6 30.6
2015.08.02 23.8 29.8 29.8
2015.08.04 19.3 25.6 25.6
2015.08.05 15.4 20.2 20.2
2015.08.07 17.2 26.1 26.1
2015.08.08 18.7 22.2 22.2
2015.08.09 19.4 25.4 25.4
2015.08.10 18.0 NaN 18.0
2015.08.11 28.4 30.0 30.0
2015.08.12 21.9 29.0 29.0
d_max m_max a_max
date
2025.08.22 28.0 NaN 28.0
2025.08.23 30.0 NaN 30.0
2025.08.24 30.0 NaN 30.0
2025.08.25 29.0 NaN 29.0
2025.08.26 29.0 NaN 29.0
2025.08.27 27.0 NaN 27.0
2025.08.28 24.0 NaN 24.0
2025.08.29 27.0 NaN 27.0
2025.08.30 24.0 NaN 24.0
2025.08.31 24.0 NaN 24.0
And, that also seems to work.
To be safe I checked that there were no really bad values in either dataframe. I.E.
... ...
min_bad = dly_min[dly_min["a_min"] > 500]
print(f"\n{min_bad}")
... ...
max_bad = dly_max[dly_max["a_max"] < -500]
print(f"\n{max_bad}")
In both cases, I got an empty dataframe. So a little more prove the queries seem to work as desired.
Get the Min and Max for Each Month
Okay, my original intent here was to come up with minimum and maximum temperature for each month for all the years in the table. Once again, as done for the monthly temperatures earlier in this series of posts.
In earlier playing around, I got those values from the dataframe returned by the above queries. But, I have now decided to try and get the queries to do that work. And, pretty easy it was.
... ...
print(f"Min and max temps for {t_mn}")
... ...
q_xj_mn = f"""SELECT SUBSTR(date, 1, 4) AS year, MIN(a_min) AS min_temp
FROM
(SELECT q1.date, q1.d_min, q2.m_min,
MIN(COALESCE(q1.d_min, 9223372036854775807),
COALESCE(q2.m_min, 9223372036854775807)) AS a_min
FROM ({q_t_min}) AS q1
LEFT JOIN ({q_m_min}) as q2
ON q1.date = q2.date)
GROUP BY year;"""
yr_min = rfall.qry_pd(q_xj_mn, i_col="year")
... ...
q_xj_mx = f"""SELECT SUBSTR(date, 1, 4) AS year, MAX(a_max) AS max_temp
FROM
(SELECT q1.date, q1.d_max, q2.m_max,
MAX(COALESCE(q1.d_max, -9223372036854775807),
COALESCE(q2.m_max, -9223372036854775807)) AS a_max
FROM ({q_t_max}) AS q1
LEFT JOIN ({q_m_max}) as q2
ON q1.date = q2.date)
GROUP BY year;"""
yr_max = rfall.qry_pd(q_xj_mx, i_col="year")
yr_mn_mx = pd.concat([yr_min, yr_max], axis=1, ignore_index=False)
print("\n", yr_mn_mx)
And in the terminal I got the following.
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Min and max temps for 08
min_temp max_temp
year
2015 12.0 30.6
2016 12.2 32.7
2017 5.6 30.4
2018 11.6 31.6
2019 13.0 29.1
2020 12.0 31.5
2021 11.6 33.2
2022 14.2 31.0
2023 13.0 30.0
2024 11.0 27.0
2025 13.4 30.0
I have made no attempt to verify the above values. But they do look entirely plausible.
Just for fun let’s try another month, say January.
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Min and max temps for 01
min_temp max_temp
year
2015 -3.6 15.1
2016 -3.3 11.7
2017 -6.7 10.0
2018 -2.2 10.4
2019 -0.5 10.8
2020 -11.0 13.6
2021 -1.9 12.1
2022 -5.0 9.6
2023 -3.0 9.0
2024 -13.0 11.0
2025 -2.0 6.0
Again, no verification, but still looks plausible.
Now, given the data I have, I really don’t know what the average monthly temperature is or how to calculate it in a statistically meaningful way. So, I am just going to, for each month, use the average of the minimum and maximum temperatures to estimate the average daily temperature for that month and year.
... ...
yr_mn_mx["avg_temp"] = yr_mn_mx[["min_temp", "max_temp"]].mean(axis=1)
print("\n", yr_mn_mx)
And…
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
Min and max temps for 01
min_temp max_temp avg_temp
year
2015 -3.6 15.1 5.75
2016 -3.3 11.7 4.20
2017 -6.7 10.0 1.65
2018 -2.2 10.4 4.10
2019 -0.5 10.8 5.15
2020 -11.0 13.6 1.30
2021 -1.9 12.1 5.10
2022 -5.0 9.6 2.30
2023 -3.0 9.0 3.00
2024 -13.0 11.0 -1.00
2025 -2.0 6.0 2.00
Note: I forgot to have the queries exclude the current month’s data if necessary. The current month’s data would likely not be complete and shouldn’t be included in a comparison to itself. I’ll leave the changes to the queries for you sort yourselves.
Done M’thinks
Well, lots of messing around, a fair bit of code (incremental development), and a probably successful result. Adding a new method yet to be done. Also considering saving the data to a table so don’t need to calculate it every time the dashboard is started.
And, still trying to figure out what other temperature or weather related data I might want to display on the dashboard. Looking at a way to use the weather condition data, but having some serious difficulty with that statistic. For example, what would the average condition be for a given day or month? Not like I can take the mean of some number of condition values (strings).
Until next time, may you get to where you wanted to go or at least close as is resonably possible.
Resources
- SQLite Tutorial » SQLite Join
- SQLite Built-In Scalar SQL Functions: coalesce
- How to create new columns derived from existing columns