As I mentioned last post, I wanted to code an utility module to allow me to add individual temperature/weather data to the database as it became available. Similar to the one I wrote for adding rainfall data. Thought about including the temperature option in the rainfall utility module. But figured that would get a little messy. So going with a separate module.

After that I am going to play around with extracting data I may be interested in. Think there are going to be some curve balls when it comes to querying the table. E.G. monthly average, minimum, and maximum temperatures. Last few years don’t have anything in the low and high columns. And in earlier years not every data row had one or both.

Adding Temperature Data on Daily Basis

Well, in fact, could be more often than daily as I sometimes record additional data as the day progresses. Depends on what I am doing and my mood. But I do try to record the temperature/weather in my daily note file at least once per day.

For now I have coded the insert query in the module. Last time I created a new method in the Weather_db class. Maybe later for the temperature table data. Not much else to say. Again data passed via command line parameters.

# get_daily_tw.py: utility module to obtain daily temperature/weather data
#     and write it to the database
# ver: 0.1.0, 2025.09.15, rek, init version
#       will initially use argparse to get data as command line parameters
#       may eventually modify to also allow ui like interface
#       Had thought about modifying get_daily_rf.py to handle this data as well,
#       but decided not to at this time. Though down the road who knows.


import argparse, time
from pathlib import Path
import pandas as pd

if __name__ == "__main__":
  from weather_db import Weather_db


def get_tw_parser():
  """Create and return parser for rainfall data.
  """
  # determine default date for optional parameter
  def_dt = time.strftime("%Y.%m.%d", time.gmtime())
  # instantiate and set up command paramter parser
  parser = argparse.ArgumentParser()
  parser.add_argument("-d", "--date", help="Supply date rainfall recorded, yyyy.mm.dd",
                      default=def_dt)
  parser.add_argument("-t", "--time", help="Supply time rainfall recorded (24 hour format), hh:mm",
                      default="06:00")
  # parser.add_argument("-u", "--history", help="Update history table as well",
  #                     action="store_true")
  parser.add_argument("-lt", "--low", help="Supply the low temperature for the day", type=float)
  parser.add_argument("-ht", "--high", help="Supply the high temperature for the day", type=float)
  parser.add_argument("-hu", "--humidity", help="Supply the humidity at time temperature recorded, float", type=int)
  parser.add_argument("-w", "--weather", help="Supply the weather condition at the time the temperature was recorded")
  parser.add_argument("temperature", help="Supply amount of rainfall recorded (mm), 2 decimals maximum",
                      type=float)
  return(parser)


if __name__ == "__main__":
  # instantiate database class
  cwd = Path(__file__).cwd()
  fl_pth = cwd/"../data"
  fl_nm = "weather.db"
  db_pth = fl_pth/fl_nm
  wdb = Weather_db(db_pth)

  tw_dt, tw_tm, tw_t, tw_l, tw_h, tw_u, tw_w = "", "", 0.0, "", "", "", ""

  # get parser and provided command line args
  tw_parse = get_tw_parser()
  tw_args = tw_parse.parse_args()

  print(f"args: {tw_args}")

  # get data from parser
  tw_t = round(tw_args.temperature, 2)
  tw_dt = tw_args.date
  tw_tm = tw_args.time
  tw_l = tw_args.low
  tw_h = tw_args.high
  tw_u = tw_args.humidity
  tw_w = tw_args.weather

  d_data = (f"{tw_dt} {tw_tm}", tw_t, tw_l, tw_h, tw_w, tw_u)
  print(d_data)

  in_tmpr_tbl = f"""INSERT INTO {wdb.tnms["tp_tnm"]} VALUES (NULL, ?, ?, ?, ?, ?, ?);"""
  wdb.qry_nsrt(in_tmpr_tbl, d_data)

  # quick check
  d_qry = f"""SELECT * FROM {wdb.tnms["tp_tnm"]} WHERE datetime LIKE '{tw_dt[:7]}%';"""
  d_rslt = wdb.qry_pd(d_qry)
  print("\n", d_rslt)

I then entered all the data for the current month. Up to the morning of 2025.09.16. Will do an afternoon update later today (this is a draft post being written long before it is set to be published).

I typically only have the temperature data for any additional daily entries, not weather condition.

Here’s the output for the last entry; somewhat truncated.

(dbd-3.13) PS R:\learn\dashboard\utils> python get_daily_tw.py 13 -d 2025.09.16 -t 07:31 -hu 85 -w "mainly clear"
args: Namespace(date='2025.09.16', time='07:31', low=None, high=None, humidity=85, weather='mainly clear', temperature=13.0)
('2025.09.16 07:31', 13.0, None, None, 'mainly clear', 85)

     row_id          datetime  temperature  dmin  dmax      condition  humidity
0     5200  2025.09.01 06:20         17.0  None  None  mostly cloudy      82.0
1     5201  2025.09.01 14:40         22.0  None  None           None      70.0
2     5202  2025.09.02 07:30         16.0  None  None  mostly cloudy      85.0
3     5203  2025.09.02 17:04         27.0  None  None           None      50.0
4     5204  2025.09.03 06:47         17.0  None  None   mainly clear      86.0
5     5205  2025.09.03 15:08         27.0  None  None           None      64.0
... ...
22    5222  2025.09.12 06:31         15.0  None  None  mostly cloudy      82.0
23    5223  2025.09.12 14:25         21.0  None  None           None      71.0
24    5224  2025.09.13 08:04         15.0  None  None  partly cloudy      86.0
25    5225  2025.09.13 16:58         22.0  None  None           None      60.0
26    5226  2025.09.14 06:42         16.0  None  None  mostly cloudy      82.0
27    5227  2025.09.15 08:06         14.0  None  None  mostly cloudy      86.0
28    5228  2025.09.15 15:01         20.0  None  None           None      62.0
29    5229  2025.09.16 07:31         13.0  None  None   mainly clear      85.0

Big Fix

When I started working on the queries to investigate the temperature table, I started getting counts I didn’t expect. Turns out I was getting strings, blank or otherwise, in places I wasn’t expecting them. Should have read the SQLite docs. Python SQLite3 is more than happy to put an empty string into a float column. I assumed an empty string would be treated as a NULL in that situation. No such luck. So, I am going to rework all my previous code and rebuild the temperature table.

Here’s some example output of a bit of code I wrote to check things out. And, I didn’t expect the % in the humidity value for 2025.01.03.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
'4699': <class 'int'>
'2025.01.03 06:00': <class 'str'>
'2.0': <class 'float'>
'': <class 'str'>
'': <class 'str'>
'cloudy': <class 'str'>
'92%': <class 'str'>
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
'3': <class 'int'>
'2015.01.03 07:00': <class 'str'>
'1.3': <class 'float'>
'': <class 'str'>
'2.7': <class 'float'>
'snow': <class 'str'>
'': <class 'str'>

Live and learn! Or, so some say. I expect the refactor will get some what messy. And I wil likely only show a small amount of the refactored code. But, that’s a bit of time away. (Calling it a day.)

Okay, I am back.

In the end I decided to modify the array of values just before they are appended to the monthly list of data. As that happens in a few places, I wrote a function to get the job done.

    def mk_null(a_dt):
      """Go through array of temperature/weather data, convert empty strings
         in numeric columns to None. So that when inserted into database,
         column value will be NULL.

         Since argument is array, changes will be made to current variable.
         No need to return.
      """
      num_cols = [2, 3, 5]
      for i in num_cols:
        if a_dt[i] == "":
          a_dt[i] = None

And, an example of one of the refactored append code blocks.

            if p_mn and mon_end:
              if pt_data[1] and pt_data[0][5:7] == p_mn:
                mk_null(pt_data)
                mn_data.append(tuple(pt_data))
              if ct_data[1] and ct_data[0][5:7] == p_mn:
                mk_null(ct_data)
                mn_data.append(tuple(ct_data))
              print(f"Writing month to database: '{p_mn}'")
              in_tmpr_tbl = f"""INSERT INTO {rfall.tnms["tp_tnm"]} VALUES (NULL, ?, ?, ?, ?, ?, ?);"""
              rfall.qry_nsrt(in_tmpr_tbl, mn_data, is_many=True)

I also refactored the note parsing function. It was returning a string ending in a % sign for the humidity value. That was making that column a string value. I want it to be of type float. Won’t bother showing the change (just moved the % sign outside the grouping brackets in the regex).

And the terminal output after processing one year’s notes file.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py
7 -> F:\BaRKqgs\gdn\bark_gdn_2022.php
Writing month to database: '01'
Writing month to database: '02'
Writing month to database: '03'
Writing month to database: '04'
Writing month to database: '05'
Writing month to database: '06'
Writing month to database: '07'
Writing month to database: '08'
Writing month to database: '09'
Writing month to database: '10'
Writing month to database: '11'
Writing month to database: '12'

      row_id          datetime  temperature  dmin  dmax      condition  humidity
0      3487  2022.01.01 06:00         -5.0  None  None         cloudy       NaN
1      3488  2022.01.01 16:21         -1.5  None  None                      NaN
2      3489  2022.01.02 06:00          2.1  None  None         cloudy       NaN
3      3490  2022.01.02 14:06          2.5  None  None                      NaN
4      3491  2022.01.03 06:00          3.3  None  None         cloudy       NaN
..      ...               ...          ...   ...   ...            ...       ...
373    3860  2022.12.27 06:00          7.0  None  None         cloudy      99.0
374    3861  2022.12.28 06:00          6.0  None  None         cloudy      99.0
375    3862  2022.12.29 06:00          4.0  None  None         cloudy      87.0
376    3863  2022.12.30 06:00          4.0  None  None     light rain      99.0
377    3864  2022.12.31 06:00          6.0  None  None  mostly cloudy      99.0

[378 rows x 7 columns]

I have reprocessed all the note files from 2015 to 2024 in total. And, 2025 to the end of August. I will process the available September data once I refactor the utility module code to reflect what I’ve done above.

It turns out that I don’t seem to need to do so. argparse returns None if a value is not present. And, that’s what is being put in the tuple for insertion to the database table. Well, if the particular value is missing.

No Rainfall?~!#

When I tried running the current dashboard code, I got an error. I eventually paid attention to the terminal output in my earlier work and realized the rainfall table was empty. I apparently copied over the wrong backup database file before adding the temperature data (as described above).

I didn’t feel like starting over with a different database file for the temperature data. Nor did I want to rerun the code to add all that rainfall data. So, instead I instatiated the Weather_db class for both the two database files I needed to merge. Got the rainfall data from one and inserted it in the empty table in the other. Like so. Note, I just stuck it in the module I had started when I found all my errors. That if False: was if True: earlier.

if False:
  # going to copy rainfall data into most recent database
  #   which somehow is missing it
  # instantiate database class using approp databases
  cwd = Path(__file__).cwd()
  fl_pth = cwd/"data"
  fl_nm = "weather.db"
  db_pth = fl_pth/fl_nm
  rfall = Weather_db(db_pth)

  fl_pth = cwd/"data"
  fl_nm = "weather_2025_09_18_1.db"
  db_pth = fl_pth/fl_nm
  rfall2 = Weather_db(db_pth)

  # copy rainfall data in rfall into rfall2
  rf_q = f"""SELECT * FROM {rfall.tnms["rf_tnm"]};"""
  rf = rfall.qry_exec(rf_q)
  print(f"len(rf): {len(rf)}\n{rf[:5]}")
  # for td in rf:
  #   td[0] = None
  # print(f"len(rf): {len(rf)}\n{rf[:5]}")

  updt_q = f"""INSERT INTO {rfall2.tnms["rf_tnm"]}(row_id, datetime, daily, monthly)
    VALUES (?, ?, ?, ?);"""
  n_cnt = rfall2.qry_nsrt(updt_q, rf, is_many=True)
  
  rf_q = f"""SELECT * FROM {rfall2.tnms["rf_tnm"]};"""
  rf2 = rfall2.qry_exec(rf_q)
  print(f"len(rf2): {len(rf2)}\n{rf2[:5]}")


  rfall2.updt_hist_tbl()
  print("rainfall history table updated")

  exit(0)

And in the terminal I got the following.

(dbd-3.13) PS R:\learn\dashboard> python chk_db_rek.py
len(rf): 1774
[(1, '2014.03.09 13:30', 37.5, 37.5), (2, '2014.03.17 11:11', 54.0, 91.5), (3, '2014.03.20 16:00', 21.0, 112.5), (4, '2014.03.23 16:00', 3.0, 115.5), (5, '2014.03.26 16:00', 19.75, 135.25)]
len(rf2): 1774
[(1, '2014.03.09 13:30', 37.5, 37.5), (2, '2014.03.17 11:11', 54.0, 91.5), (3, '2014.03.20 16:00', 21.0, 112.5), (4, '2014.03.23 16:00', 3.0, 115.5), (5, '2014.03.26 16:00', 19.75, 135.25)]
rainfall history table updated

I added the two days of rainfall for this month using my utility module. The dashboard now appears to work correctly.

Text in Numeric Colunns

While continuing to work on queries to extract data I would be interested in, I had some issues with bad/inaccurate data in the temperature column of the temperature table.

For example, I had a temperature of 195°C for one date (2021.08.11 08:01 195.0 None NaN None). I used a query to get all the rows with a temperature greater than 30°C.

  nx_q = f"""SELECT * FROM {rfall.tnms["tp_tnm"]}
    WHERE substr(datetime, 1, 7)='{tst_yr}.{q_mn}' AND (dmax>30.0 OR temperature>30.0);

I manually used update or delete queries to fix all the entries I found. I did something similar for temperatures less than -30°C.

But I figured I should check to see how many rows had text in the numeric columns. So, new bit of code in a play around module.

for t_col in ["temperature", "dmin", "dmax", "humidity"]:
  vt_q = f"""SELECT * FROM {rfall.tnms["tp_tnm"]}
    WHERE TYPEOF({t_col})='text';"""
  print("\n", vt_q)
  mn_data = rfall.qry_pd(vt_q)
  print(mn_data)

And in the terminal, I got the following.

(dbd-3.13) PS R:\learn\dashboard\utils> python chk_db_rek.py

 SELECT * FROM temperature
    WHERE TYPEOF(temperature)='text';
Empty DataFrame
Columns: [row_id, datetime, temperature, dmin, dmax, condition, humidity]
Index: []

 SELECT * FROM temperature
    WHERE TYPEOF(dmin)='text';
   row_id          datetime  temperature   dmin  dmax condition humidity
0    2273  2020.05.04 08:33          8.6  6.9.0  16.1               None

 SELECT * FROM temperature
    WHERE TYPEOF(dmax)='text';
Empty DataFrame
Columns: [row_id, datetime, temperature, dmin, dmax, condition, humidity]
Index: []

 SELECT * FROM temperature
    WHERE TYPEOF(humidity)='text';
   row_id          datetime  temperature  dmin  dmax      condition humidity
0    3295  2021.08.30 05:10         15.6   NaN  24.5  mostly cloudy      30%
1    3336  2021.09.17 07:10         12.0  17.3   NaN                     77%
2    3337  2021.09.18 19:39         16.0   NaN   NaN                     62%
3    3338  2021.09.19 07:29         12.0   NaN   NaN                     83%
4    3339  2021.09.19 11:39         14.0   NaN   NaN                     82%
5    3340  2021.09.19 18:24         16.0   NaN   NaN                     73%
6    3341  2021.09.20 06:14         12.0   NaN   NaN                     90%
7    3354  2021.09.29 07:53         11.1   NaN   NaN                     83%
8    3355  2021.09.29 19:11         11.0   NaN   NaN                     84%

So I am going to manually fix those entries as appropriate. Will take me a bit of time, but, may as well have data as good as possible.

q_up = f"""UPDATE {rfall.tnms["tp_tnm"]}
    SET dmin=6.9
    WHERE datetime='2020.05.04 08:33';"""
up_t = rfall.qry_dlt(q_up)

hu_fix = {"2021.08.30 05:10": 30,
          "2021.08.30 05:10": 77,
          "2021.09.18 19:39": 62,
          "2021.09.19 07:29": 83,
          "2021.09.19 11:39": 82,
          "2021.09.19 18:24": 73,
          "2021.09.20 06:14": 90,
          "2021.09.29 07:53": 83,
          "2021.09.29 19:11": 84
}

for dttm, hmd in hu_fix.items():
  q_up = f"""UPDATE {rfall.tnms["tp_tnm"]}
    SET humidity={hmd}
    WHERE datetime='{dttm}';"""
  up_t = rfall.qry_dlt(q_up)

for t_col in ["temperature", "dmin", "dmax", "humidity"]:
  vt_q = f"""SELECT * FROM {rfall.tnms["tp_tnm"]}
    WHERE TYPEOF({t_col})='text';"""
  print("\n", vt_q)
  mn_data = rfall.qry_pd(vt_q)
  print(mn_data)

And in the terminal, I got the following.

(dbd-3.13) PS R:\learn\dashboard\utils> python chk_db_rek.py
 SELECT * FROM temperature
    WHERE TYPEOF(temperature)='text';
Empty DataFrame
Columns: [row_id, datetime, temperature, dmin, dmax, condition, humidity]
Index: []

 SELECT * FROM temperature
    WHERE TYPEOF(dmin)='text';
Empty DataFrame
Columns: [row_id, datetime, temperature, dmin, dmax, condition, humidity]
Index: []

 SELECT * FROM temperature
    WHERE TYPEOF(dmax)='text';
Empty DataFrame
Columns: [row_id, datetime, temperature, dmin, dmax, condition, humidity]
Index: []

 SELECT * FROM temperature
    WHERE TYPEOF(humidity)='text';
Empty DataFrame
Columns: [row_id, datetime, temperature, dmin, dmax, condition, humidity]
Index: []

So, other than blatantly wrong values, things in the temperature table now look reasonable. I also did a check of the rainfall table and none of the numeric columns had any string values.

C’est Fini

I think this post has gotten plenty long enough for me. But some questionable database values corrected. Though perhaps many more I have not yet identified. Not to mention a bunch of code I never expected I’d have to write. But writing code is where all the fun is, you think?

Until next time, may your projects be better thought out and researched that mine tend to be. That said, I am really only playing at this. You might in fact be working at it. If so, I expect there is a design and development plan in place. Not so much here.