Been doing a little (extra) thinking while writing the draft of the previous post and afterwards. Couple of things in particular.
Firstly, I don’t like all that duplicated code in the __init__
, add_mon_rf
and add_day_rf
methods. There must be a reasonable way to eliminate that duplication.
Secondly, I am pretty sure I am going to want to have the average, minimum and maximum rainfall amounts for at least one past month while building the dashboard. Strikes me that always querying the database for those values for all prior months would be a lot of unnecessary overhead. So I am thinking there should be another table that stores the historical average, maximum and minimum for each of the twelve months. Being updated as necessary. Probably once when all the files have been parsed and then whenever the “current” month has been completed.
So, I think a refactoring of my Weather_db class and database are in order. I will leave the historical data table until after I have refactored and tested the existing class code.
Refactor Weather_db Class
This has turned into a rather major refactoring. New methods and all the remaining old methods refactored. Done on the fly without incremental modifications and testing. So may need more refactoring once I start testing.
SQL Execution Methods
I added two methods specifically for executing queries. So far, one for insert queries and one for all the others I have used to this point in development. Again, some code duplication, but didn’t for now want to somehow merge them into a single method. What I wanted to do was remove the opening and closing of a database connection in other methods not actually needing to do so, including __init__
.
def qry_nsrt(self, qry, q_data, is_many=False):
"""Execute an INSERT query. if is_many is True, use executemany to execute query.
params:
qry: string with the appropriate query
q_data: data expected for query, list of data sets if is_many is True
is_many: use executemany rather than execute
returns: nothing
"""
conn = sqlite3.connect(self.db_pth)
curs = conn.cursor()
if is_many:
curs.executemany(qry, q_data)
else:
curs.execute(qry, q_data)
conn.commit()
curs.close()
conn.close()
def qry_exec(self, qry):
"""Execute a supplied query.
params:
qry: the query to run (string)
returns:
the output of cursor.fetchall() following query executions
"""
conn = sqlite3.connect(self.db_pth)
curs = conn.cursor()
curs.execute(qry)
rslt = curs.fetchall()
curs.close()
conn.close()
return rslt
Utility Methods
That meant I needed to create methods to handle some of the things I was doing that required me to open a connection and instantiate a cursor. Things like check if a specific table (i.e. table name) is in the database or get a count of rows in a given table.
def get_tbl_rw_cnt(self, t_nm):
"""Get count of rows in specified table
params:
t_nm: table name
t_col: column to count, defaults to row_id
returns: count of rows for specified column
"""
q_cnt = f"SELECT COUNT(datetime) FROM {self.rf_tnm};"
rw_cnt = self.qry_exec(q_cnt)
return rw_cnt[0][0]
def is_tbl_in_db(self, t_nm):
"""Check if the given table is in the current database.
params:
t_nm: table name to look for
returns:
list of table names,
True if table in database, False otherwise
"""
q_tbl = f"SELECT name FROM sqlite_master;"
nms = self.qry_exec(q_tbl)
return nms, (t_nm,) in nms
Add Rainfall Data Method
And, I wrote a new method to take care of both ways for adding rainfall data to the rainfall table. Not sure it is the best approach as it only deals with a single table, not generalized data insertion. But it does seem to obey the single-responsibility principle.
def add_rainfall(self, rf_data):
"""Add rainfall data to rainfall table
params:
rf_data: a tuple or a list of tuples of data to add to table
each tuple: (datetime, daily rainfall, month to-date rainfall)
returns: count of rows added to the table
"""
if not rf_data:
return 0
do_many = not isinstance(rf_data, tuple)
p_rws = self.get_tbl_rw_cnt(self.rf_tnm)
q_ins = f"INSERT INTO {self.rf_tnm} VALUES(NULL, ?, ?, ?)"
self.qry_nsrt(q_ins, rf_data, do_many)
c_rws = self.get_tbl_rw_cnt(self.rf_tnm)
return (c_rws - p_rws)
__init__ and Make Table Method
And, finally I refactored the class’ __init__()
to use the new query execution and utility functions. Ditto for mk_rain_tbl()
.
class Weather_db():
"""This class will provide access to the sqlite database I plan to
use as the data source for my 'weather' dashboard project.
Initially it will only cover one or two tables for recording past
rainfall amounts, daily and monthly totals.
"""
def __init__(self, db_pth):
self.db_pth = db_pth # save path to database file
self.rf_tnm = "rainfall" # rainfall table name
nms, tbl_ok = self.is_tbl_in_db(self.rf_tnm)
print(f"__init__(): table names -> {nms}")
# only create the table if it does not already exist
if not tbl_ok:
tbl_ok = self.mk_rain_tbl()
... ...
Another Query Execute Method
When I was starting to write some test code, I realized I was using Pandas to get nicer terminal output from some of the queries that I was running. So, added a class method to handle that situation. Well, at least the way I was using it in my test code.
def qry_pd(self, qry):
"""Execute query using pandas.
params:
qry: sql query to execute (string)
returns:
dataframe containing result of query execution
"""
conn = sqlite3.connect(self.db_pth)
mn_data = pd.read_sql_query(qry, conn)
conn.close()
return mn_data
Testing All the Refactoring
Okay, time to test all those changes. Let’s start with class instantiation and the test block that checks the rainfall table contents.
Instantiate Class
This also creates the table if it is not present. Which to start it is not—new empty database file. Some duplication of code from previous post, sorry.
if __name__ == "__main__":
tst_i_mon = False
tst_i_day = False
chk_i_rf = True
cwd = Path(__file__).cwd()
fl_pth = cwd/"../data"
fl_nm = "weather.db"
db_pth = fl_pth/fl_nm
rfall = Weather_db(db_pth)
... ...
if chk_i_rf:
rw_q = f"SELECT COUNT(row_id) FROM {rfall.rf_tnm};"
nx_q = f"""SELECT substr(datetime, 1, 7) as c_month, MIN(daily) as minMonth, MAX(daily) as maxMonth
FROM {rfall.rf_tnm}
GROUP BY c_month;"""
tbl_q = f"SELECT * FROM {rfall.rf_tnm};"
rslt = rfall.get_tbl_rw_cnt(rfall.rf_tnm)
print(f"\nrows in {rfall.rf_tnm} table: {rslt}")
mn_data = rfall.qry_pd(nx_q)
print("\n", mn_data)
And, in the terminal I got the following. Pretty much as expected.
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> []
in mk_rain_tbl: [('rainfall',)]
rows in rainfall table: 0
Empty DataFrame
Columns: [c_month, minMonth, maxMonth]
Index: []
Test Adding Data to the Rainfall Table
Okay, let’s refactor the developmental/test code that added data to the rainfall table in one month chunks (list of tuples). And see if things still work. Not much change in the code, just the call to add the data to the table.
... ...
tst_i_mon = True
tst_i_day = False
chk_i_rf = True
... ...
if tst_i_mon:
# let's add the data for 2014.03 & 2014.04, full month at a time
p_mn, c_mn, e_mn = "", "", "2014.05"
rf_rws, t_mn = [], 0
for rf_dt in tst_rfall:
# if starting new month, update table, set/reset variables accordingly
if not c_mn:
c_mn = rf_dt[0][:7]
print(f"init c_mn: {c_mn}")
if rf_dt[0][:7] != c_mn:
print(f"\nrainfall data for the month ({len(rf_rws)} days): {rf_rws}")
num_rws = rfall.add_rainfall(rf_rws)
print(f"\nrows added to rainfall table: {num_rws}")
rf_rws, t_mn = [], 0
p_mn, c_mn = c_mn, rf_dt[0][:7]
print(f"current: p_mn {p_mn}, c_mn {c_mn}")
if rf_dt[0][:7] == e_mn:
break
# otherwise process current data and add to list for the month
match(rf_dt[1]):
case "pm":
dttm = f"{rf_dt[0]} 16:00"
case "am":
dttm = f"{rf_dt[0]} 08:00"
case _:
dttm = f"{rf_dt[0]} {rf_dt[1]}"
try:
drf = float(rf_dt[2])
except ValueError:
raise(ValueError(f"could not convert string to float: {rf_dt[2]}"))
t_mn += drf
rf_rws.append((dttm, drf, t_mn))
And in the terminal, pretty much what we got in the previous post. About the only real change is that row counts display a little nicer.
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> [('rainfall',)]
init c_mn: 2014.03
rainfall data for the month (7 days): [('2014.03.09 13:30', 37.5, 37.5), ('2014.03.17 11:11', 54.0, 91.5), ('2014.03.20 16:00', 21.0, 112.5), ('2014.03.23 16:00', 3.0, 115.5), ('2014.03.26 16:00', 19.75, 135.25), ('2014.03.29 11:14', 33.0, 168.25), ('2014.03.30 16:00', 6.0, 174.25)]
rows added to rainfall table: 7
current: p_mn 2014.03, c_mn 2014.04
rainfall data for the month (9 days): [('2014.04.04 15:11', 14.0, 14.0), ('2014.04.06 16:00', 12.5, 26.5), ('2014.04.07 15:45', 4.75, 31.25), ('2014.04.09 13:30', 1.5, 32.75), ('2014.04.18 17:00', 53.0, 85.75), ('2014.04.20 09:00', 3.5, 89.25), ('2014.04.23 14:30', 3.0, 92.25), ('2014.04.25 14:30', 17.0, 109.25), ('2014.04.29 13:00', 7.0, 116.25)]
rows added to rainfall table: 9
current: p_mn 2014.04, c_mn 2014.05
rows in rainfall table: 16
c_month minMonth maxMonth
0 2014.03 3.0 54.0
1 2014.04 1.5 53.0
Now let’s do the same for the test that added the first 3 days of 2014.05 one day (tuple) at a time. Again little change in the code.
... ...
tst_i_mon = False
tst_i_day = True
chk_i_rf = True
... ...
if tst_i_day:
t_mn = 0
# let's add first 3 days of 2014.05, 1 day (tuple) at a time
for drw in range(16, 19):
rf_d, rf_t, rf_f = tst_rfall[drw]
dttm = f"{rf_d} {rf_t}"
try:
drf = float(rf_f)
except ValueError:
raise(ValueError(f"could not convert string to float: {rf_f}"))
t_mn += drf
num_rws = rfall.add_rainfall((dttm, drf, t_mn))
And, the terminal output was as follows. It matches the output in the previous post.
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> [('rainfall',)]
rows in rainfall table: 19
c_month minMonth maxMonth
0 2014.03 3.0 54.0
1 2014.04 1.5 53.0
2 2014.05 5.0 54.0
So at this point, I think this particular refactoring is done. Let’s move on to figuring out how to build and update the historical data table.
New Table
As I mentioned, I currently expect that I will want, for the dashboard, to be able to show the historical average monthly and minimum and maximum daily values for the current month. Along with that I would also display the current month’s total and perhaps its daily minimum and maximum. If not informative, might at least be entertaining.
For the table itself, I expect 5 columns. An id column as primary key, a month
column (“mm”) and avg
, min
, max
columns for the rainfall data of interest. This is obviously a best guess at this time (as I have not sorted out what the dashboard will look like and can’t, as a result, really define what this class needs to provide). Also having trouble naming the table, but for now I am going to go with
rf_mon_history
. Longer than I like, but…
Refactor __init__()
I was going to get into writing the method to create the new table. But quickly realized I had some more thinking and likely refactoring to do. I am currently using a class variable for the rainfall table name. With this second table and possibly more, I figured I’d use a loop in __init__()
to check for each table’s existence. Creating it if necessary. Individual variables make that a touch messy. So I figured I’d use a list of table names. But, I also want to use variable-like names in other methods not a list index. So, going with a dictionary. Probably a better way to do this but…
And, so I am going to start with the refactoring of __init__()
.
As I worked on the refactoring, I also wanted to avoid an ever growing if/elif
block. So decided to use a dispatch function to control which table creation method is called, mk_tbl()
. At first, I was passing the table name to dispatch function. In the end, I used the keys from the table names dictionary. I decided to code the table creation method lookup in the dispatch function/method to use those same keys.
The initialization method now looks like the following.
def __init__(self, db_pth):
self.db_pth = db_pth # save path to database file
# put all table names in dictionary
self.tnms = {
"rf_tnm": "rainfall",
"mh_tnm": "rf_mon_history",
}
# check if all tables exist, if not create
for k_tnm, tnm in self.tnms.items():
nms, tbl_ok = self.is_tbl_in_db(tnm)
print(f"__init__(): table names -> {nms}")
# only create the table if it does not already exist
if not tbl_ok:
tbl_ok = self.mk_tbl(k_tnm)
A bit more work before we can test anything. And, because of the change from a table name variable to a dictionary, a bit of refactoring of other methods to use the dictionary rather than the earlier variable(s) to get table names. E.G. self.rf_tnm
replaced with self.tnms['rf_tnm']
. Well, in fact, I instatiated a variable c_tnm = self.tnms['rf_tnm']
and replaced every instance of self.rf_tnm
with c_tnm
. Thought that was a touch tidier.
Will likely have to test everything from scratch once again.
Create Table
Okay, let’s write a new method to create the table. Should be pretty similar to the one for creating the rainfall table. I will get to the dispatch function after that. So will still be a bit before I can test this code.
def mk_mhist_tbl(self):
"""Create the monthly history table. The name is hard-coded in a class dictionary.
It has, at least initially, 5 columns: mh_id, month, avg, min, max.
The values of avg (average), min (minimum), max (maximum) are the appropriate
values of the appropriate functions for the 12 months of the year over all the
data for each respective month in the rainfall database.
params: None
returns:
True if table created or already existed, False otherwise
"""
c_tnm = self.tnms["mh_tnm"]
nms, tbl_ok = self.is_tbl_in_db(c_tnm)
if tbl_ok:
return True
rf_tbl = f"""CREATE TABLE IF NOT EXISTS {c_tnm} (
mh_id INTEGER PRIMARY KEY,
month TEXT NOT NULL,
avg REAL NOT NULL,
min REAL NOT NULL,
max REAL NOT NULL
);"""
rslt = self.qry_exec(rf_tbl)
nms, tbl_ok = self.is_tbl_in_db(c_tnm)
print(f"in mk_mhist_tbl: {nms}")
return tbl_ok
Code the Dispatch Function
I don’t know how many tables I will eventually have. But I expect I will code a different method to create each one. So as discussed above, I am going to use a dispatch function to decide which table creation method to call. Not something I have any experience with, so likely to be a little less than well coded. I will pass the key from the self.tnms
dictionary. In the dispatch function I will have a dictionary with the same keys pointing to the appropriate class method.
def mk_tbl(self, k_tnm):
"""A dispatch function for creating a selection of database tables.
params:
tnm: name of table to create
returns:
return value from invoked method
"""
call_who = {
"mh_tnm": self.mk_mhist_tbl,
"rf_tnm": self.mk_rain_tbl,
}
rslt = call_who[k_tnm]()
return rslt
Quick Test Instantiating Class
Starting with empty database file. We should see output indicating the creation of each table.
if __name__ == "__main__":
tst_i_mon = False
tst_i_day = False
chk_i_rf = True
... ...
if chk_i_rf:
for c_tnm in rfall.tnms.values():
rslt = rfall.get_tbl_rw_cnt(c_tnm)
print(f"\nrows in {c_tnm} table: {rslt}")
tbl_q = f"SELECT * FROM {c_tnm};"
if c_tnm == "rainfall":
nx_q = f"""SELECT substr(datetime, 1, 7) as c_month, MIN(daily) as minMonth, MAX(daily) as maxMonth
FROM {c_tnm}
GROUP BY c_month;"""
elif c_tnm == "rf_mon_history":
nx_q = f"SELECT month, avg FROM {c_tnm}"
mn_data = rfall.qry_pd(nx_q)
print("\n", mn_data)
And in the terminal I got the following. Which, for an empty database file, looks to be the epxected output.
(dbd-3.13) PS R:\learn\dashboard\utils> python weather_db.py
__init__(): table names -> []
in mk_rain_tbl: [('rainfall',)]
__init__(): table names -> [('rainfall',)]
in mk_mhist_tbl: [('rainfall',), ('rf_mon_history',)]
rows in rainfall table: 0
Empty DataFrame
Columns: [c_month, minMonth, maxMonth]
Index: []
rows in rf_mon_history table: 0
Empty DataFrame
Columns: [month, avg]
Index: []
Done?
Well, I think that’s it for this one. It appears I need to do even more thinking.
I believe to really test populating the new table, I need more data than I currently have in the database. Matter of fact, I am thinking I need at least 24 months of data, perhaps 36 to really test things out.
So I am considering writing a new module to add all the rainfall data from rainGauge_2019.inc.php
to the database. That would include adding the appropriate rows to the dates done CSV file. I will then work on figuring out how to get the data I want into the monthly history table. Though as I have moved the functions I was using to parse the rainfall files in data2db
to a new module, rg_data.py
, in the utils
directory, I may just add the code to it.
Until next time, I hope you enjoy going around in circles as much as I seem to. Lots of coding and refactoring, yet no where near starting to work on the dashboard. And, looks like more of the same for the immediate future.