Been doing a little (extra) thinking while writing the draft of the previous post and afterwards. Couple of things in particular.

Firstly, I don’t like all that duplicated code in the __init__, add_mon_rf and add_day_rf methods. There must be a reasonable way to eliminate that duplication.

Secondly, I am pretty sure I am going to want to have the average, minimum and maximum rainfall amounts for at least one past month while building the dashboard. Strikes me that always querying the database for those values for all prior months would be a lot of unnecessary overhead. So I am thinking there should be another table that stores the historical average, maximum and minimum for each of the twelve months. Being updated as necessary. Probably once when all the files have been parsed and then whenever the “current” month has been completed.

So, I think a refactoring of my Weather_db class and database are in order. I will leave the historical data table until after I have refactored and tested the existing class code.

Refactor Weather_db Class

This has turned into a rather major refactoring. New methods and all the remaining old methods refactored. Done on the fly without incremental modifications and testing. So may need more refactoring once I start testing.

SQL Execution Methods

I added two methods specifically for executing queries. So far, one for insert queries and one for all the others I have used to this point in development. Again, some code duplication, but didn’t for now want to somehow merge them into a single method. What I wanted to do was remove the opening and closing of a database connection in other methods not actually needing to do so, including __init__.

  def qry_nsrt(self, qry, q_data, is_many=False):
    """Execute an INSERT query. if is_many is True, use executemany to execute query.

    params:
      qry: string with the appropriate query
      q_data: data expected for query, list of data sets if is_many is True
      is_many: use executemany rather than execute
    
    returns: nothing
    """
    conn = sqlite3.connect(self.db_pth)
    curs = conn.cursor()

    if is_many:
      curs.executemany(qry, q_data)
    else:
      curs.execute(qry, q_data)
    conn.commit()

    curs.close()
    conn.close()


  def qry_exec(self, qry):
    """Execute a supplied query.

    params:
      qry: the query to run (string)

    returns:
      the output of cursor.fetchall() following query executions
    """
    conn = sqlite3.connect(self.db_pth)
    curs = conn.cursor()

    curs.execute(qry)
    rslt = curs.fetchall()

    curs.close()
    conn.close()

    return rslt

Utility Methods

That meant I needed to create methods to handle some of the things I was doing that required me to open a connection and instantiate a cursor. Things like check if a specific table (i.e. table name) is in the database or get a count of rows in a given table.

  def get_tbl_rw_cnt(self, t_nm):
    """Get count of rows in specified table

    params:
      t_nm: table name
      t_col: column to count, defaults to row_id
    
    returns: count of rows for specified column
    """
    q_cnt = f"SELECT COUNT(datetime) FROM {self.rf_tnm};"
    rw_cnt = self.qry_exec(q_cnt)
    return rw_cnt[0][0]


  def is_tbl_in_db(self, t_nm):
    """Check if the given table is in the current database.

    params:
      t_nm: table name to look for
    
    returns:
      list of table names,
      True if table in database, False otherwise
    """
    q_tbl = f"SELECT name FROM sqlite_master;"
    nms = self.qry_exec(q_tbl)
    return nms, (t_nm,) in nms

Add Rainfall Data Method

And, I wrote a new method to take care of both ways for adding rainfall data to the rainfall table. Not sure it is the best approach as it only deals with a single table, not generalized data insertion. But it does seem to obey the single-responsibility principle.

  def add_rainfall(self, rf_data):
    """Add rainfall data to rainfall table

    params:
      rf_data: a tuple or a list of tuples of data to add to table
        each tuple: (datetime, daily rainfall, month to-date rainfall)
    
    returns: count of rows added to the table
    """
    if not rf_data:
      return 0
    
    do_many = not isinstance(rf_data, tuple) 
    p_rws = self.get_tbl_rw_cnt(self.rf_tnm)

    q_ins = f"INSERT INTO {self.rf_tnm} VALUES(NULL, ?, ?, ?)"
    self.qry_nsrt(q_ins, rf_data, do_many)

    c_rws = self.get_tbl_rw_cnt(self.rf_tnm)

    return (c_rws - p_rws)

__init__ and Make Table Method

And, finally I refactored the class’ __init__() to use the new query execution and utility functions. Ditto for mk_rain_tbl().

class Weather_db():
  """This class will provide access to the sqlite database I plan to
  use as the data source for my 'weather' dashboard project.

  Initially it will only cover one or two tables for recording past
  rainfall amounts, daily and monthly totals.
  """
  def __init__(self, db_pth):
    self.db_pth = db_pth        # save path to database file
    self.rf_tnm = "rainfall"    # rainfall table name

    nms, tbl_ok = self.is_tbl_in_db(self.rf_tnm)
    print(f"__init__(): table names -> {nms}")

    # only create the table if it does not already exist
    if not tbl_ok:
      tbl_ok = self.mk_rain_tbl()
... ...

Another Query Execute Method

When I was starting to write some test code, I realized I was using Pandas to get nicer terminal output from some of the queries that I was running. So, added a class method to handle that situation. Well, at least the way I was using it in my test code.

  def qry_pd(self, qry):
    """Execute query using pandas.

    params:
      qry: sql query to execute (string)
    
    returns:
      dataframe containing result of query execution
    """
    conn = sqlite3.connect(self.db_pth)
    mn_data = pd.read_sql_query(qry, conn)
    conn.close()
    return mn_data

Testing All the Refactoring

Okay, time to test all those changes. Let’s start with class instantiation and the test block that checks the rainfall table contents.

Instantiate Class

This also creates the table if it is not present. Which to start it is not—new empty database file. Some duplication of code from previous post, sorry.

if __name__ == "__main__":
  tst_i_mon = False
  tst_i_day = False
  chk_i_rf = True

  cwd = Path(__file__).cwd()
  fl_pth = cwd/"../data"
  fl_nm = "weather.db"
  db_pth = fl_pth/fl_nm
  rfall = Weather_db(db_pth)
... ...
  if chk_i_rf:
    rw_q = f"SELECT COUNT(row_id) FROM {rfall.rf_tnm};"
    nx_q = f"""SELECT substr(datetime, 1, 7) as c_month, MIN(daily) as minMonth, MAX(daily) as maxMonth
      FROM {rfall.rf_tnm}
      GROUP BY c_month;"""
    tbl_q = f"SELECT * FROM {rfall.rf_tnm};"

    rslt = rfall.get_tbl_rw_cnt(rfall.rf_tnm)
    print(f"\nrows in {rfall.rf_tnm} table: {rslt}")

    mn_data = rfall.qry_pd(nx_q)
    print("\n", mn_data)

And, in the terminal I got the following. Pretty much as expected.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> []
in mk_rain_tbl: [('rainfall',)]

rows in rainfall table: 0

 Empty DataFrame
Columns: [c_month, minMonth, maxMonth]
Index: []

Test Adding Data to the Rainfall Table

Okay, let’s refactor the developmental/test code that added data to the rainfall table in one month chunks (list of tuples). And see if things still work. Not much change in the code, just the call to add the data to the table.

... ...
  tst_i_mon = True
  tst_i_day = False
  chk_i_rf = True
... ...
  if tst_i_mon:
    # let's add the data for 2014.03 & 2014.04, full month at a time
    p_mn, c_mn, e_mn = "", "", "2014.05"
    rf_rws, t_mn = [], 0
    for rf_dt in tst_rfall:
      # if starting new month, update table, set/reset variables accordingly
      if not c_mn:
        c_mn = rf_dt[0][:7]
        print(f"init c_mn: {c_mn}")
      if rf_dt[0][:7] != c_mn:
        print(f"\nrainfall data for the month ({len(rf_rws)} days): {rf_rws}")
        num_rws = rfall.add_rainfall(rf_rws)
        print(f"\nrows added to rainfall table: {num_rws}")
        rf_rws, t_mn = [], 0
        p_mn, c_mn = c_mn, rf_dt[0][:7]
        print(f"current: p_mn {p_mn}, c_mn {c_mn}")
      if rf_dt[0][:7] == e_mn:
        break

      # otherwise process current data and add to list for the month
      match(rf_dt[1]):
        case "pm":
          dttm = f"{rf_dt[0]} 16:00"
        case "am":
          dttm = f"{rf_dt[0]} 08:00"
        case _:
          dttm = f"{rf_dt[0]} {rf_dt[1]}"
      try:
        drf = float(rf_dt[2])
      except ValueError:
        raise(ValueError(f"could not convert string to float: {rf_dt[2]}"))
      t_mn += drf
      rf_rws.append((dttm, drf, t_mn))

And in the terminal, pretty much what we got in the previous post. About the only real change is that row counts display a little nicer.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> [('rainfall',)]
init c_mn: 2014.03

rainfall data for the month (7 days): [('2014.03.09 13:30', 37.5, 37.5), ('2014.03.17 11:11', 54.0, 91.5), ('2014.03.20 16:00', 21.0, 112.5), ('2014.03.23 16:00', 3.0, 115.5), ('2014.03.26 16:00', 19.75, 135.25), ('2014.03.29 11:14', 33.0, 168.25), ('2014.03.30 16:00', 6.0, 174.25)]

rows added to rainfall table: 7
current: p_mn 2014.03, c_mn 2014.04

rainfall data for the month (9 days): [('2014.04.04 15:11', 14.0, 14.0), ('2014.04.06 16:00', 12.5, 26.5), ('2014.04.07 15:45', 4.75, 31.25), ('2014.04.09 13:30', 1.5, 32.75), ('2014.04.18 17:00', 53.0, 85.75), ('2014.04.20 09:00', 3.5, 89.25), ('2014.04.23 14:30', 3.0, 92.25), ('2014.04.25 14:30', 17.0, 109.25), ('2014.04.29 13:00', 7.0, 116.25)]

rows added to rainfall table: 9
current: p_mn 2014.04, c_mn 2014.05

rows in rainfall table: 16

    c_month  minMonth  maxMonth
0  2014.03       3.0      54.0
1  2014.04       1.5      53.0

Now let’s do the same for the test that added the first 3 days of 2014.05 one day (tuple) at a time. Again little change in the code.

... ...
  tst_i_mon = False
  tst_i_day = True
  chk_i_rf = True
... ...
  if tst_i_day:
    t_mn = 0
    # let's add first 3 days of 2014.05, 1 day (tuple) at a time
    for drw in range(16, 19):
      rf_d, rf_t, rf_f = tst_rfall[drw]
      dttm = f"{rf_d} {rf_t}"
      try:
        drf = float(rf_f)
      except ValueError:
        raise(ValueError(f"could not convert string to float: {rf_f}"))
      t_mn += drf
      num_rws = rfall.add_rainfall((dttm, drf, t_mn))

And, the terminal output was as follows. It matches the output in the previous post.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> [('rainfall',)]

rows in rainfall table: 19

    c_month  minMonth  maxMonth
0  2014.03       3.0      54.0
1  2014.04       1.5      53.0
2  2014.05       5.0      54.0

So at this point, I think this particular refactoring is done. Let’s move on to figuring out how to build and update the historical data table.

New Table

As I mentioned, I currently expect that I will want, for the dashboard, to be able to show the historical average monthly and minimum and maximum daily values for the current month. Along with that I would also display the current month’s total and perhaps its daily minimum and maximum. If not informative, might at least be entertaining.

For the table itself, I expect 5 columns. An id column as primary key, a month column (“mm”) and avg, min, max columns for the rainfall data of interest. This is obviously a best guess at this time (as I have not sorted out what the dashboard will look like and can’t, as a result, really define what this class needs to provide). Also having trouble naming the table, but for now I am going to go with rf_mon_history. Longer than I like, but…

Refactor __init__()

I was going to get into writing the method to create the new table. But quickly realized I had some more thinking and likely refactoring to do. I am currently using a class variable for the rainfall table name. With this second table and possibly more, I figured I’d use a loop in __init__() to check for each table’s existence. Creating it if necessary. Individual variables make that a touch messy. So I figured I’d use a list of table names. But, I also want to use variable-like names in other methods not a list index. So, going with a dictionary. Probably a better way to do this but…

And, so I am going to start with the refactoring of __init__().

As I worked on the refactoring, I also wanted to avoid an ever growing if/elif block. So decided to use a dispatch function to control which table creation method is called, mk_tbl(). At first, I was passing the table name to dispatch function. In the end, I used the keys from the table names dictionary. I decided to code the table creation method lookup in the dispatch function/method to use those same keys.

The initialization method now looks like the following.

  def __init__(self, db_pth):
    self.db_pth = db_pth      # save path to database file
    # put all table names in dictionary
    self.tnms = {
      "rf_tnm": "rainfall",
      "mh_tnm": "rf_mon_history",
    }
    
    # check if all tables exist, if not create
    for k_tnm, tnm in self.tnms.items():
      nms, tbl_ok = self.is_tbl_in_db(tnm)
      print(f"__init__(): table names -> {nms}")

      # only create the table if it does not already exist
      if not tbl_ok:
        tbl_ok = self.mk_tbl(k_tnm)

A bit more work before we can test anything. And, because of the change from a table name variable to a dictionary, a bit of refactoring of other methods to use the dictionary rather than the earlier variable(s) to get table names. E.G. self.rf_tnm replaced with self.tnms['rf_tnm']. Well, in fact, I instatiated a variable c_tnm = self.tnms['rf_tnm'] and replaced every instance of self.rf_tnm with c_tnm. Thought that was a touch tidier.

Will likely have to test everything from scratch once again.

Create Table

Okay, let’s write a new method to create the table. Should be pretty similar to the one for creating the rainfall table. I will get to the dispatch function after that. So will still be a bit before I can test this code.

  def mk_mhist_tbl(self):
    """Create the monthly history table. The name is hard-coded in a class dictionary.
    It has, at least initially, 5 columns: mh_id, month, avg, min, max.
    The values of avg (average), min (minimum), max (maximum) are the appropriate
    values of the appropriate functions for the 12 months of the year over all the
    data for each respective month in the rainfall database.
    
    params: None

    returns:
      True if table created or already existed, False otherwise
    """

    c_tnm = self.tnms["mh_tnm"]
    nms, tbl_ok = self.is_tbl_in_db(c_tnm)
    if tbl_ok:
      return True

    rf_tbl = f"""CREATE TABLE IF NOT EXISTS {c_tnm} (
      mh_id INTEGER PRIMARY KEY,
      month TEXT NOT NULL,
      avg REAL NOT NULL,
      min REAL NOT NULL,
      max REAL NOT NULL
      );"""

    rslt = self.qry_exec(rf_tbl)

    nms, tbl_ok = self.is_tbl_in_db(c_tnm)
    print(f"in mk_mhist_tbl: {nms}")

    return tbl_ok

Code the Dispatch Function

I don’t know how many tables I will eventually have. But I expect I will code a different method to create each one. So as discussed above, I am going to use a dispatch function to decide which table creation method to call. Not something I have any experience with, so likely to be a little less than well coded. I will pass the key from the self.tnms dictionary. In the dispatch function I will have a dictionary with the same keys pointing to the appropriate class method.

  def mk_tbl(self, k_tnm):
    """A dispatch function for creating a selection of database tables.
    
      params:
        tnm: name of table to create
      
      returns:
        return value from invoked method
    """
    call_who = {
      "mh_tnm": self.mk_mhist_tbl,
      "rf_tnm": self.mk_rain_tbl,
    }
    rslt = call_who[k_tnm]()
    return rslt

Quick Test Instantiating Class

Starting with empty database file. We should see output indicating the creation of each table.

  if __name__ == "__main__":
    tst_i_mon = False
    tst_i_day = False
    chk_i_rf = True
... ...
  if chk_i_rf:
    for c_tnm in rfall.tnms.values():
      rslt = rfall.get_tbl_rw_cnt(c_tnm)
      print(f"\nrows in {c_tnm} table: {rslt}")

      tbl_q = f"SELECT * FROM {c_tnm};"

      if c_tnm == "rainfall":
        nx_q = f"""SELECT substr(datetime, 1, 7) as c_month, MIN(daily) as minMonth, MAX(daily) as maxMonth
          FROM {c_tnm}
          GROUP BY c_month;"""
      elif c_tnm == "rf_mon_history":
        nx_q = f"SELECT month, avg FROM {c_tnm}"
  
      mn_data = rfall.qry_pd(nx_q)
      print("\n", mn_data)

And in the terminal I got the following. Which, for an empty database file, looks to be the epxected output.

(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> []
in mk_rain_tbl: [('rainfall',)]
__init__(): table names -> [('rainfall',)]
in mk_mhist_tbl: [('rainfall',), ('rf_mon_history',)]

rows in rainfall table: 0

 Empty DataFrame
Columns: [c_month, minMonth, maxMonth]
Index: []

rows in rf_mon_history table: 0

 Empty DataFrame
Columns: [month, avg]
Index: []

Done?

Well, I think that’s it for this one. It appears I need to do even more thinking.

I believe to really test populating the new table, I need more data than I currently have in the database. Matter of fact, I am thinking I need at least 24 months of data, perhaps 36 to really test things out.

So I am considering writing a new module to add all the rainfall data from rainGauge_2019.inc.php to the database. That would include adding the appropriate rows to the dates done CSV file. I will then work on figuring out how to get the data I want into the monthly history table. Though as I have moved the functions I was using to parse the rainfall files in data2db to a new module, rg_data.py, in the utils directory, I may just add the code to it.

Until next time, I hope you enjoy going around in circles as much as I seem to. Lots of coding and refactoring, yet no where near starting to work on the dashboard. And, looks like more of the same for the immediate future.