I have, for better or worse, been doing a bit of thinking since my last post.

First, I have decided to add another method to the Dates_done class. It will allow me to add a complete row to the dictionary rather than one date at a time. That is, the year, month and all the days for that combination. I was thinking rather than updating for each date it might be more efficient to only do so when I had the parsing for a given year and month completed.

Second, I think I will just look at recording the rainfall data from those three files directly to a database table. Expect that would also likely be more efficient. Any object to store the full CSV data in memory would be rather large. Though I could likely avoid that by writing directly to the file as I read each row from the data files. But I would likely need to load the CSV file into memory to write it to the appropriate database file. Unfortunately I haven’t really researched that situation. However, the few examples I looked at used Pandas to create a dataframe from the CSV file. Then using to_sql() on that dataframe to write it to the database.

New `Dates_done` Method

Decided it might be prudent to allow adding a complete month in one shot.

This is really quite simple, as the method expects a dictionary with the data properly formatted. I also decided to allow the user to specify whether or not an existing key should have its value replaced. As written, it will, in fact, accept a number of months at one time. Though I should probably add code to check that the key(s) are valid combinations.

  def add_month(self, n_mon, rplc=False):
    """add new month to self.dts dict
      params:
        n_mon: month of data to add, format {"year month": "d1,...,dN"}
        rplc: replace data if key already in dictionary
      returns:
        nothing
      raise:
        KeyError if year/month already in dictionary and rplc=False
    """
    for n_ky, days in n_mon.items():
      if n_ky in self.dts and not rplc:
        raise(KeyError("duplicate date"))
      self.dts[n_ky] = set({days})
      self.c_new[n_ky] = len(self.dts[n_ky])

Quick test. With a bit of refactoring to previous code.

... ..
  do_add_dt = False
  do_add_mon = True
... ...
  if do_add_mon:
    # let's try the new add_month method
    nw_data = {"2014 06": "12,13,17,20,24,30"}
    print(f"\nadding year/month to dates done dict: {nw_data}")
    dts_dn.add_month(nw_data)

  if do_add_dt or do_add_mon:
    # let's update the csv file on disk
    dts_dn.updt_csv_file()
    # let's check that worked as expected
    dts_dn = Dates_done(dd_pth)
    print(f"\ncnt dts: {dts_dn.c_new}")
    print(f"{dts_dn.dts}")

And, in the terminal I got the following.

adding year/month to dates done dict: {'2014 06': '12,13,17,20,24,30'}

cnt dts: {'2014 03': 7, '2014 04': 9, '2014 05': 2, '2014 06': 6}
{'2014 03': {'20', '26', '17', '29', '23', '09', '30'}, '2014 04': {'06', '20', '07', '18', '25', '29', '23', '04', '09'}, '2014 05': {'05', '11'}, '2014 06': {'12', '17', '24', '20', '13', '30'}}

Without rigorous testing, that appears to work.

Database

SQLite3 does not have a date or datetime storage class, so I am going to use an abbreviated text version of the ISO 8601 format (YYYY-MM-DD HH:MM:SS.SSS). I won’t be bothering with the seconds. And hope that the SQLite3 built-in STRFTIME() will allow me to group on any of a dates components (e.g. month and/or year). I expect I will want to be able to display monthly rainfall averages in the dashboard. Perhaps even monthly average minimums and maximums. I don’t think I want to mess with Julian or Unix epoch values; especially if I want to visually check the data being stored in the database.

I had originally thought I would have separate fields for year, month, day and time. I am hoping the above will simplify things in terms of the table definition, if not the code.

I also plan to put all the functionality into a new class in a new module to be imported into my main development module, data2db. Let’s get started.

Class Weather_db

Okay, in my utils sub-directory, new module weather_db.py that will contain the code and interface for the class Weather_db. And in the data sub-directory an empty file named weather.db—the database file for the weather related data I plan to add. Starting with rainfall data (as reviewed in the previous post). I am hardcoding the database and table names in the class and module code.

Open Database File and Create Rainfall Table if Necessary

The class __init__ will create some class variables, open the SQLite database (currently an empty file) and if the rainfall table is not found in the database it will call the method to create it. Figured the table creation code (for whatever tables get created) should not be in the class initialization method.

And, because I had the code to check if the table already existed in a number of locations in the project’s modules, I decided to add a specific method to do that as well.

I am creating the table as a rowid table.

In SQLite, table rows normally have a 64-bit signed integer ROWID which is unique among all rows in the same table. (WITHOUT ROWID tables are the exception.)
… …
If a table contains a column of type INTEGER PRIMARY KEY, then that column becomes an alias for the ROWID.
… …
If no ROWID is specified on the insert, or if the specified ROWID has a value of NULL, then an appropriate ROWID is created automatically.

SQLite Autoincrement

# weather_db.py: module for class Weather_db
#  ver: 0.1.0: 2025.07.17, rek, init version

# Going to go at this slowly, testing things as I proceed,
# refactoring as needed or suggested by any issues

from pathlib import Path
import sqlite3

class Weather_db():
  """This class will provide access to the sqlite database I plan to
  use as the data source for my 'weather' dashboard project.

  Initially it will only cover one or two tables for recording past
  rainfall amounts, daily and monthly totals.
  """
  def __init__(self, db_pth):
    self.db_pth = db_pth        # save path to database file
    self.rf_tnm = "rainfall"    # rainfall table name

    conn = sqlite3.connect(db_pth)
    curs = conn.cursor()

    nms, tbl_ok = self.is_tbl_in_db(curs, self.rf_tnm)
    print(f"__init__(): table names -> {nms}")

    # only create the table if it does not already exist
    if not tbl_ok:
      tbl_ok = self.mk_rain_tbl(curs)
    print(f"mk_rain_tbl: {tbl_ok}")

    curs.close()
    conn.close()


  def is_tbl_in_db(self, curs, t_nm):
    """Check if the given table is in the current database.

    params:
      curs: cursor to open database
      t_nm: table name to look for
    
    returns:
      list of table names,
      True if table in database, False otherwise
    """
    rslt = curs.execute(f"SELECT name FROM sqlite_master")
    nms = rslt.fetchall()
    return nms, (t_nm,) in nms


  def mk_rain_tbl(self, curs):
    """Create the rainfall table. The name is hard-coded as a class variable.
    It has, at least initially, 3 columns: datetime, daily, monthly.
    daily is the amount of rainfall for the 24 hours ending datetime.
    monthly is the month to date total. For now the last row for a
    given month will have the final total for the month.

    params:
      curs: cursor to open database

    returns:
      True if table created or already existed, False otherwise
    """

    nms, tbl_ok = self.is_tbl_in_db(curs, self.rf_tnm)
    if tbl_ok:
      return True

    rf_tbl = f"""CREATE TABLE IF NOT EXISTS {self.rf_tnm} (
      row_id INTEGER PRIMARY KEY,
      datetime TEXT NOT NULL,
      daily REAL NOT NULL,
      monthly REAL NOT NULL
      );"""

    rslt = curs.execute(rf_tbl)

    nms, tbl_ok = self.is_tbl_in_db(curs, self.rf_tnm)
    print(f"in mk_rain_tbl: {nms}")

    return tbl_ok

Another quick test.

if __name__ == "__main__":
  cwd = Path(__file__).cwd()
  fl_pth = cwd/"../data"
  fl_nm = "weather.db"
  db_pth = fl_pth/fl_nm
  rfall = Weather_db(db_pth)

And in the terminal, executing the module twice in succession. Database file was empty before the first execution.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> []
in mk_rain_tbl: [('rainfall',)]

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> [('rainfall',)]
mk_rain_tbl: True

One can see that the first time the table was not found in the database, so it was created. The second time it was found and no attempt was made to create it.

Test Data

I have decided that for further development of this class I will use some hardcoded test data rather than extract it from the three PHP rainfall files. I want, during development, to add some data then test things like getting the average, minimum and maximum for each of the available months in the table.

So, I originally modified tst_3 in data2db to give me a list of tuples for the first 16 months in the first PHP file. Each tuple has three strings for the date, time and the rainfall reading for that combination of date and time. But on second thought, I put the data in a module, data\tst_rf_data.py, which I will import to my class module. I am using three strings in the tuple, because the data parsing function in the data2db module returns strings (i.e. regex output).

And this is what that file looks like.

tst_rain = [
  ("2014.03.09", "13:30", "37.5"),
  ("2014.03.17", "11:11", "54"),
  ("2014.03.20", "pm", "21"),
  ("2014.03.23", "pm", "3"),
  ("2014.03.26", "pm", "19.75"),
  ("2014.03.29", "11:14", "33"),
  ("2014.03.30", "16:00", "6"),
  ("2014.04.04", "15:11", "14"),
  ("2014.04.06", "16:00", "12.5"),
  ("2014.04.07", "15:45", "4.75"),
  ("2014.04.09", "13:30", "1.5"),
  ("2014.04.18", "17:00", "53"),
  ("2014.04.20", "09:00", "3.5"),
  ("2014.04.23", "14:30", "3"),
  ("2014.04.25", "14:30", "17"),
  ("2014.04.29", "13:00", "7"),
  ... ...
  ("2015.04.04", "08:00", "2.0"),
  ("2015.04.11", "08:00", "2.0"),
  ("2015.04.13", "08:00", "1.0"),
  ("2015.04.14", "08:00", "11.0"),
  ("2015.04.16", "08:44", "1.0"),
  ("2015.04.22", "08:00", "4.25"),
  ("2015.04.24", "16:00", "26.5"),
  ("2015.04.25", "14:45", "3.0"),
  ("2015.04.27", "15:00", "9.0"),
  ("2015.05.05", "08:00", "11.0"),
  ("2015.05.06", "08:00", "7.5"),
  ("2015.05.26", "08:00", "3.0"),
  ("2015.06.03", "07:00", "4.0"),
  ("2015.06.19", "14:00", "1.5"),
]

Methods to Add Data

Okay, let’s look at writing a function to add data to the rainfall table. We are getting the recorded rainfall for the appropriate days in each month. So, I am just going to use a variable to keep track of the month-to-date total as I process the extracted data for each relevant date and time.

I am planning while extracting the rainfall data from the files to add one month at a time to the database table. Once things are going (in production?), I will add each new day’s data individually. Let’s start with the method for a month at a time (it will likely work for any amount of properly formatted data).

Month at a Time

  def add_mon_rf(self, rf_rws):
    """Add a month's worth of data to the rainfall table in one go

    params:
      rf_rws: array containing tuples of data to add to table
        each tuple: (datetime, daily rainfall, month to-date rainfall)
    
    returns: count of row added to table
    """
    if not rf_rws:
      return 0
    
    conn = sqlite3.connect(self.db_pth)
    curs = conn.cursor()

    q_ins = f"INSERT INTO {self.rf_tnm} VALUES(NULL, ?, ?, ?)"
    curs.executemany(q_ins, rf_rws)
    conn.commit()
    
    # get count of matching rows in table, assume one month for now
    c_ym = rf_rws[0][0][:7]
    q_cnt = f"SELECT COUNT(datetime) FROM {self.rf_tnm} WHERE datetime LIKE '%{c_ym}%';"
    curs.execute(q_cnt)
    rw_cnt = curs.fetchall()
    
    curs.close()
    conn.close()

    return rw_cnt

For an initial test I am going to generate an array of data for the months of 2014.03, 2014.04. Then add each list of data to the table separately. And confirm by getting a count of the rows in the table and the minimum and maximum daily rainfalls by month. Bit of code to get that done, but want to be sure things are more or less working correctly.

if __name__ == "__main__":
  tst_i_mon = True
  chk_i_rf = True
... ...
  if tst_i_mon:
    p_mn, c_mn, e_mn = "", "", "2014.05"
    rf_rws, t_mn = [], 0
    for rf_dt in tst_rfall:
      if not c_mn:
        c_mn = rf_dt[0][:7]
        print(f"init c_mn: {c_mn}")
      if rf_dt[0][:7] != c_mn:
        print(f"\nrainfall data for the month ({len(rf_rws)} days): {rf_rws}")
        num_rws = rfall.add_mon_rf(rf_rws)
        print(f"\nrows added to rainfall table: {num_rws}")
        rf_rws, t_mn = [], 0
        p_mn, c_mn = c_mn, rf_dt[0][:7]
        print(f"current: p_mn {p_mn}, c_mn {c_mn}")
      if rf_dt[0][:7] == e_mn:
        break

      match(rf_dt[1]):
        case "pm":
          dttm = f"{rf_dt[0]} 16:00"
        case "am":
          dttm = f"{rf_dt[0]} 08:00"
        case _:
          dttm = f"{rf_dt[0]} {rf_dt[1]}"
      try:
        drf = float(rf_dt[2])
      except ValueError:
        raise(ValueError(f"could not convert string to float: {rf_dt[2]}"))
      t_mn += drf
      rf_rws.append((dttm, drf, t_mn))

  if chk_i_rf:
    rw_q = f"SELECT COUNT(row_id) FROM {rfall.rf_tnm};"
    nx_q = f"""SELECT substr(datetime, 1, 7) as c_month, MIN(daily) as minMonth, MAX(daily) as maxMonth
      FROM {rfall.rf_tnm}
      GROUP BY c_month;"""
    tbl_q = f"SELECT * FROM {rfall.rf_tnm};"

    conn = sqlite3.connect(db_pth)
    curs = conn.cursor()

    curs.execute(rw_q)
    rslt = curs.fetchall()
    print(f"\nrows in table: {rslt}")

    mn_data = pd.read_sql_query(nx_q, conn)

    curs.close()
    conn.close()

    print("\n", mn_data)

And, in the terminal I got the following.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
init c_mn: 2014.03

rainfall data for the month (7 days): [('2014.03.09 13:30', 37.5, 37.5), ('2014.03.17 11:11', 54.0, 91.5), ('2014.03.20 16:00', 21.0, 112.5), ('2014.03.23 16:00', 3.0, 115.5), ('2014.03.26 16:00', 19.75, 135.25), ('2014.03.29 11:14', 33.0, 168.25), ('2014.03.30 16:00', 6.0, 174.25)]

rows added to rainfall table: [(7,)]
current: p_mn 2014.03, c_mn 2014.04

rainfall data for the month (9 days): [('2014.04.04 15:11', 14.0, 14.0), ('2014.04.06 16:00', 12.5, 26.5), ('2014.04.07 15:45', 4.75, 31.25), ('2014.04.09 13:30', 1.5, 32.75), ('2014.04.18 17:00', 53.0, 85.75), ('2014.04.20 09:00', 3.5, 89.25), ('2014.04.23 14:30', 3.0, 92.25), ('2014.04.25 14:30', 17.0, 109.25), ('2014.04.29 13:00', 7.0, 116.25)]

rows added to rainfall table: [(9,)]
current: p_mn 2014.04, c_mn 2014.05

rows in table: [(16,)]

    c_month  minMonth  maxMonth
0  2014.03       3.0      54.0
1  2014.04       1.5      53.0

And a visual inspection of the file says those values are correct.

Day at a Time

Before calling this post done, let’s add that method to insert rainfall data one day at a time. Probably not much different than the one we just finished. Likely only a different cursor method call. I.E. execute versus executemany.

  def add_day_rf(self, rf_dy):
    """Add a day's worth of data to the rainfall table

    params:
      rf_dy: a tuple of data to add to table
        each tuple: (datetime, daily rainfall, month to-date rainfall)
    
    returns: [(1,)] if row added to table, [(0,)] otherwise
    """
    if not rf_dy:
      return 0
    
    conn = sqlite3.connect(self.db_pth)
    curs = conn.cursor()

    q_ins = f"INSERT INTO {self.rf_tnm} VALUES(NULL, ?, ?, ?)"
    curs.execute(q_ins, rf_dy)
    conn.commit()
    
    # get count of matching rows in table, assume one month for now
    c_dt = rf_dy[0][:10]
    q_cnt = f"SELECT COUNT(datetime) FROM {self.rf_tnm} WHERE datetime LIKE '%{c_dt}%';"
    curs.execute(q_cnt)
    rw_cnt = curs.fetchall()
    
    curs.close()
    conn.close()

    return rw_cnt

Have a feeling, given the duplication, that I should somehow combine the two into a single method. But for now, let’s test what we have.

I will add the first 3 days for 2014.05 and run my simple check. I inserted the following if block between the test block adding 2 months of data at a time and the block checking the data in the table. Left out a lot of validation that would be done in production (similar to the add month test).

... ...
  tst_i_mon = False
  tst_i_day = True
  chk_i_rf = True
... ...
  if tst_i_mon:
... ...
  if tst_i_day:
    t_mn = 0
    # let's add first 3 days of 2014.05
    for drw in range(16, 19):
      rf_d, rf_t, rf_f = tst_rfall[drw]
      dttm = f"{rf_d} {rf_t}"
      try:
        drf = float(rf_f)
      except ValueError:
        raise(ValueError(f"could not convert string to float: {rf_f}"))
      t_mn += drf
      num_rws = rfall.add_day_rf((dttm, drf, t_mn))
      
  if chk_i_rf:
... ...

And, in the terminal:

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> [('rainfall',)]

rows in table: [(19,)]

    c_month  minMonth  maxMonth
0  2014.03       3.0      54.0
1  2014.04       1.5      53.0
2  2014.05       5.0      54.0

And, since we had 16 rows previously, adding 3 more should give us 19. And I visually confirmed the min and max for the three days of May added to the table.

C’est Fini

Well, not what I’d consider significant progress, but enough, I believe, for this post.

Until next time enjoy your time coding, learning and playing around.

Resources

SQLite Documentation
SQLite CREATE TABLE
SQLite Autoincrement
SQLite LIKE
SQLite Data Types
SQLite Date & Time
How to Use and Store Dates in SQLite
SQL COUNT
SQLite3 cursor.executemany
SQLite3 connect
SQLite3 cursor
SQLite3 connection.commit
SQLite3 connection.close

Too Old To Code

Dashboard II

New `Dates_done` Method

Database

Class Weather_db

Open Database File and Create Rainfall Table if Necessary

Test Data

Methods to Add Data

Month at a Time

Day at a Time

C’est Fini

Resources

New Dates_done Method

Database

Class Weather_db

Open Database File and Create Rainfall Table if Necessary

Test Data

Methods to Add Data

Month at a Time

Day at a Time

C’est Fini

Resources

New `Dates_done` Method