Took longer than expected to get out of hospital—complication or two. That said, it has been two months since my release. I started rehab about three weeks ago. Making slow, but decent progress. Figured it was time I got off my butt and started posting. And finish covering this dashboard project.

Okay, I manually processed the tentative CSV file for the 2015 weather conditions. Took me about one and a quarter hours. Really not bad everything considered. There are a number of edits that I doubt I could have coded something to handle. So, manual it will be for the remaining files.

Though I am currently thinking about adding code to reduce the number of notes I need to handle/process. If there is an entry in the database for the given date with a weather condition, no need to include that date in the CSV I will be manually processing.

But, first let’s have a look at how I might handle the content of the processed CSV file.

What Needs to Go Into the Database?

And, how should it be added: UPDATE or INSERT. Will need both depending on the circumstances.

What I am going to do is go through the CSV file line by line. Get any rows in the temperature table for the current data’s date, not datetime. If there is a row returned with a weather condition, I am just going to go to the next line in the CSV file. Though perhaps I should check if the database and CSV file have the same or similar weather condition. If not go with the one in the CSV file, since it has been edited by a human (though some may disagree with that characterization). Will think about it as I cover what I have done so far.

I have for now just been using January 2015 while working on the dev/test code. And, at first I just listed the data in the CSV file and the database for that month.

Compare Data, Initial Attempt

Okay, query the temperature table for the January 2015 data. Convert to a dictionary, I want to be able to access the date in a loop. Then, open 2015 CSV file and read contents for January.

  if cmp_csv_db:
    src_ndx = 0
    f_yr = tmpr_srcs[src_ndx].__str__()[-8:-4]
    n_tbl = rfall.tnms["tp_tnm"]
    e_dt = f"{f_yr}.01.31"
    p_dt = e_dt[:7]

    wc_qry = f"""SELECT SUBSTR(datetime,1,10) as date, SUBSTR(datetime,12,5) as time, condition
      FROM {n_tbl}
      WHERE SUBSTR(datetime,1,7)='2015.01' AND condition!='';"""
    wcs = rfall.qry_exec(wc_qry)
    d_wc = {}
    for t_wc in wcs:
      d_wc[t_wc[0]] = t_wc[1:]
    print(d_wc)

    fl_pth = cwd/"data"
    fl_nm = f"wcond_{f_yr}.csv"
    d_pth = fl_pth/fl_nm
    print("\n", e_dt, p_dt, d_pth, "\n")

    with open(d_pth, "r", newline="") as wfl:
      wc_rdr = csv.reader(wfl, delimiter=',')
      for r_wc in wc_rdr:
        if r_wc[0][:10] > e_dt:
          break
        if r_wc[0][:10] in d_wc:
          print(f"{r_wc}\n\t{d_wc[r_wc[0][:10]]}")
        else:
          print(r_wc)

And in the terminal, I got the following.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py
{'2015.01.01': ('07:30', 'clear'), '2015.01.02': ('08:15', 'rain'), '2015.01.03': ('07:00', 'snow'), '2015.01.04': ('09:20', 'snow'), '2015.01.05': ('19:45', 'rain'), '2015.01.07': ('14:17', 'cloudy'), '2015.01.08': ('09:02', 'mostly cloudy'), '2015.01.09': ('08:06', 'cloudy'), '2015.01.12': ('07:22', 'fog'), '2015.01.13': ('07:57', 'fog'), '2015.01.16': ('08:00', 'rain'), '2015.01.22': ('09:06', 'cloudy'), '2015.01.23': ('08:36', 'rain'), '2015.01.24': ('08:32', 'light rain'), '2015.01.27': ('07:55', 'mostly cloudy'), '2015.01.29': ('07:45', 'clear'), '2015.01.31': ('09:49', 'fog')}

 2015.01.31 2015.01 R:\learn\dashboard\data\wcond_2015.csv

['2015.01.01 07:30', 'clear']
        ('07:30', 'clear')
['2015.01.02 08:15', 'light snow']
        ('08:15', 'rain')
['2015.01.04', 'snow']
        ('09:20', 'snow')
['2015.01.06 06:30', 'mostly cloudy']
['2015.01.07 14:17', 'cloudy']
        ('14:17', 'cloudy')
['2015.01.08 09:02', 'fog']
        ('09:02', 'mostly cloudy')
['2015.01.09 08:06', 'cloudy']
        ('08:06', 'cloudy')
['2015.01.10', 'cloudy']
['2015.01.11 06:30', 'rain']
['2015.01.12 07:22', 'fog']
        ('07:22', 'fog')
['2015.01.13 07:57', 'fog']
        ('07:57', 'fog')
['2015.01.14', 'fog']
['2015.01.15', 'cloudy']
['2015.01.17', 'cloudy']
['2015.01.19', 'mainly clear']
['2015.01.20', 'clear']
['2015.01.21', 'mostly cloudy']
['2015.01.22 09:06', 'cloudy']
        ('09:06', 'cloudy')
['2015.01.24 08:32', 'rain']
        ('08:32', 'light rain')
['2015.01.25 06:30', 'fog']
['2015.01.26', 'partly cloudy']
['2015.01.27 07:55', 'mostly cloudy']
        ('07:55', 'mostly cloudy')
['2015.01.28 07:50', 'cloudy']
['2015.01.29 07:45', 'partly cloudy']
        ('07:45', 'clear')
['2015.01.30 07:35', 'partly cloudy']
['2015.01.31 05:30', 'cloudy']
        ('09:49', 'fog')

Clearly a goodly number of entries in the CSV file that are not in the database.

Develop Database Activity

Okay, let’s have a look at what I might do with those extra CSV rows. After working on the above I did realize I should actually look at every date in the temperature table. Not just those with a weather condition. So, let’s modify the above accordingly.

But I also figured I’d want to get database data one date at a time, and only for dates in the CSV file. That would eliminate building the dictionary for each month or the whole year. More time to process the CSV file but perhaps simpler to see what is happening. So, a new function, get_wc_4dt(w_dt). For now in the data2db module; eventually likely in the Weather_db class.

    def get_wc_4dt(w_dt):
      wc_qry = f"""SELECT SUBSTR(datetime,1,10) as date, SUBSTR(datetime,12,5) as time, condition
        FROM {n_tbl}
        WHERE SUBSTR(datetime,1,10)='{w_dt}';"""
      wcs = rfall.qry_exec(wc_qry)
      return wcs

And, refactoring the preceding code to use the new function. Only the pertinent bits. I won’t bother showing the commented out query code from above.

    with open(d_pth, "r", newline="") as wfl:
      wc_rdr = csv.reader(wfl, delimiter=',')
      for r_wc in wc_rdr:
        if r_wc[0][:10] > e_dt:
          break
        chk_wc = get_wc_4dt(r_wc[0][:10])
        print(r_wc)
        if chk_wc:
          print(f"\t{chk_wc}")
        else:
          print(f"\tdate not in table")

In the terminal the above produced the following.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py

 2015.01.31 2015.01 R:\learn\dashboard\data\wcond_2015.csv

['2015.01.01 07:30', 'clear']
        [('2015.01.01', '07:30', 'clear')]
['2015.01.02 08:15', 'light snow']
        [('2015.01.02', '08:15', 'rain')]
['2015.01.04', 'snow']
        [('2015.01.04', '08:40', ''), ('2015.01.04', '09:20', 'snow')]
['2015.01.06 06:30', 'mostly cloudy']
        [('2015.01.06', '08:30', '')]
['2015.01.07 14:17', 'cloudy']
        [('2015.01.07', '14:17', 'cloudy')]
['2015.01.08 09:02', 'fog']
        [('2015.01.08', '09:02', 'mostly cloudy')]
['2015.01.09 08:06', 'cloudy']
        [('2015.01.09', '08:06', 'cloudy')]
['2015.01.10', 'cloudy']
        date not in table
['2015.01.11 06:30', 'rain']
        [('2015.01.11', '06:00', '')]
['2015.01.12 07:22', 'fog']
        [('2015.01.12', '07:22', 'fog')]
['2015.01.13 07:57', 'fog']
        [('2015.01.13', '07:57', 'fog')]
['2015.01.14', 'fog']
        [('2015.01.14', '09:00', '')]
['2015.01.15', 'cloudy']
        date not in table
['2015.01.17', 'cloudy']
        [('2015.01.17', '10:21', '')]
['2015.01.19', 'mainly clear']
        [('2015.01.19', '08:20', '')]
['2015.01.20', 'clear']
        [('2015.01.20', '09:36', '')]
['2015.01.21', 'mostly cloudy']
        [('2015.01.21', '08:38', '')]
['2015.01.22 09:06', 'cloudy']
        [('2015.01.22', '09:06', 'cloudy')]
['2015.01.24 08:32', 'rain']
        [('2015.01.24', '08:32', 'light rain')]
['2015.01.25 06:30', 'fog']
        [('2015.01.25', '09:44', '')]
['2015.01.26', 'partly cloudy']
        [('2015.01.26', '09:39', '')]
['2015.01.27 07:55', 'mostly cloudy']
        [('2015.01.27', '07:55', 'mostly cloudy')]
['2015.01.28 07:50', 'cloudy']
        [('2015.01.28', '07:50', '')]
['2015.01.29 07:45', 'partly cloudy']
        [('2015.01.29', '07:45', 'clear')]
['2015.01.30 07:35', 'partly cloudy']
        [('2015.01.30', '08:05', '')]
['2015.01.31 05:30', 'cloudy']
        [('2015.01.31', '09:49', 'fog')]

A couple of things to note. They will affect how we handle the database updates for any given date. Most dates in the CSV file have at least one row in the database for the same date. But, there can be more than one. Will need to carefully deal with that situation. And, there are dates in the CSV file that have no matching date in the database. Another edge case to deal with. Finally, some of the database rows have a weather condition, some do not. Something which will likely need to be considered when updating the weather conditions in the temperature table.

Once, again, I propose to leave dates with a weather condition in the database untouched. For dates with database rows without a weather condition, I will update that row using the condition in the CSV file. If multiple rows for a date, I will update the first one on the assumption that it is the earliest timed entry for that date. Finally, if there are no rows in the database for a date in the CSV file, I will insert a new row, without temperatures or humidity, in the table.

What I did next was work through a bit of test code to document those steps for a given month in the terminal. Not touching the database just yet.

    with open(d_pth, "r", newline="") as wfl:
      wc_rdr = csv.reader(wfl, delimiter=',')
      for r_wc in wc_rdr:
        if r_wc[0][:10] > e_dt:
          break

        chk_wc = get_wc_4dt(r_wc[0][:10])
        if r_wc[0][:7] == p_dt:
          f_wc = ""
          if len(chk_wc) > 1:
            for t_wc in chk_wc:
              if not f_wc:
                f_wc = t_wc[2]
          elif len(chk_wc) == 1:
            f_wc = chk_wc[0][2]
          if chk_wc and f_wc != "":
            print(f"in table -> {chk_wc}")
          else:
            print(f"not in table -> {r_wc}\n\t-> {chk_wc}")
            if chk_wc:
              print(f"""\t-> UPDATE n_tbl SET condition='{r_wc[1]}' WHERE datetime='{chk_wc[0][0]} {chk_wc[0][1]}'""")
            else:
              if len(r_wc[0]) == 16:
                print(f"""\t-> INSERT INTO {n_tbl} VALUES (NULL, '{r_wc[0]}', NULL, NULL, NULL, '{r_wc[1]}', NULL);""")
              else:
                print(f"""\t-> INSERT INTO {n_tbl} VALUES (NULL, '{r_wc[0]} 06:00', NULL, NULL, NULL, '{r_wc[1]}', NULL);""")

I expect I will not be executing individual update or insert queries as I traverse the CSV file. I will likely, as previously, do so a month at a time. Using SQLite’s executemany method to insert a list of rows in one call to the database. Well, likely two calls, one for inserts and one for updates. But for now, here’s what the above code produced in the terminal.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py

 2015.01.31 2015.01 R:\learn\dashboard\data\wcond_2015.csv

in table -> [('2015.01.01', '07:30', 'clear')]
in table -> [('2015.01.02', '08:15', 'rain')]
in table -> [('2015.01.04', '08:40', ''), ('2015.01.04', '09:20', 'snow')]
not in table -> ['2015.01.06 06:30', 'mostly cloudy']
        -> [('2015.01.06', '08:30', '')]
        -> UPDATE n_tbl SET condition='mostly cloudy' WHERE datetime='2015.01.06 08:30'
in table -> [('2015.01.07', '14:17', 'cloudy')]
in table -> [('2015.01.08', '09:02', 'mostly cloudy')]
in table -> [('2015.01.09', '08:06', 'cloudy')]
not in table -> ['2015.01.10', 'cloudy']
        -> []
        -> INSERT INTO temperature VALUES (NULL, '2015.01.10 06:00', NULL, NULL, NULL, 'cloudy', NULL);
not in table -> ['2015.01.11 06:30', 'rain']
        -> [('2015.01.11', '06:00', '')]
        -> UPDATE n_tbl SET condition='rain' WHERE datetime='2015.01.11 06:00'
in table -> [('2015.01.12', '07:22', 'fog')]
in table -> [('2015.01.13', '07:57', 'fog')]
not in table -> ['2015.01.14', 'fog']
        -> [('2015.01.14', '09:00', '')]
        -> UPDATE n_tbl SET condition='fog' WHERE datetime='2015.01.14 09:00'
not in table -> ['2015.01.15', 'cloudy']
        -> []
        -> INSERT INTO temperature VALUES (NULL, '2015.01.15 06:00', NULL, NULL, NULL, 'cloudy', NULL);
not in table -> ['2015.01.17', 'cloudy']
        -> [('2015.01.17', '10:21', '')]
        -> UPDATE n_tbl SET condition='cloudy' WHERE datetime='2015.01.17 10:21'
not in table -> ['2015.01.19', 'mainly clear']
        -> [('2015.01.19', '08:20', '')]
        -> UPDATE n_tbl SET condition='mainly clear' WHERE datetime='2015.01.19 08:20'
not in table -> ['2015.01.20', 'clear']
        -> [('2015.01.20', '09:36', '')]
        -> UPDATE n_tbl SET condition='clear' WHERE datetime='2015.01.20 09:36'
not in table -> ['2015.01.21', 'mostly cloudy']
        -> [('2015.01.21', '08:38', '')]
        -> UPDATE n_tbl SET condition='mostly cloudy' WHERE datetime='2015.01.21 08:38'
in table -> [('2015.01.22', '09:06', 'cloudy')]
in table -> [('2015.01.24', '08:32', 'light rain')]
not in table -> ['2015.01.25 06:30', 'fog']
        -> [('2015.01.25', '09:44', '')]
        -> UPDATE n_tbl SET condition='fog' WHERE datetime='2015.01.25 09:44'
not in table -> ['2015.01.26', 'partly cloudy']
        -> [('2015.01.26', '09:39', '')]
        -> UPDATE n_tbl SET condition='partly cloudy' WHERE datetime='2015.01.26 09:39'
in table -> [('2015.01.27', '07:55', 'mostly cloudy')]
not in table -> ['2015.01.28 07:50', 'cloudy']
        -> [('2015.01.28', '07:50', '')]
        -> UPDATE n_tbl SET condition='cloudy' WHERE datetime='2015.01.28 07:50'
in table -> [('2015.01.29', '07:45', 'clear')]
not in table -> ['2015.01.30 07:35', 'partly cloudy']
        -> [('2015.01.30', '08:05', '')]
        -> UPDATE n_tbl SET condition='partly cloudy' WHERE datetime='2015.01.30 08:05'
in table -> [('2015.01.31', '09:49', 'fog')]

Actually Updating the Database

Okay, let’s start on some code that will eventually be used to update the database. Starting a new dev/test if block. Lots of code copied from the block above. For now looking at updating database a month at a time. And, only going to show data to do so for the first month of the file.

Two new lists to store the data for inserts and the data for updates. For the latter the data for the where clause has to come after the weather condition. For inserts considerably more data points. That said nothing really new in this code. Have seen most of it in previous database interactions.

    if wc_to_db:
      src_ndx = 0
      f_yr = tmpr_srcs[src_ndx].__str__()[-8:-4]
      n_tbl = rfall.tnms["tp_tnm"]
      e_dt = f"{f_yr}.01.31"
      # p_dt = e_dt[:7]

      fl_pth = cwd/"data"
      fl_nm = f"wcond_{f_yr}.csv"
      d_pth = fl_pth/fl_nm
      print("\n", e_dt, d_pth, "\n")

      d_updt, d_nsrt = [], []
      c_mon, p_mon = "", ""

      with open(d_pth, "r", newline="") as wfl:
        wc_rdr = csv.reader(wfl, delimiter=',')
        for r_wc in wc_rdr:
          if r_wc[0][:10] > e_dt:
            break
          t_mn = r_wc[0][5:7]
          if c_mon != t_mn:
            p_mon, c_mon = c_mon, t_mn
            if d_nsrt:
              print("list of table inserts")
              for wc in d_nsrt:
                print(wc)
            if d_updt:
              print("\nlist of table updates")
              for wc in d_updt:
                print(wc)
            
          chk_wc = get_wc_4dt(r_wc[0][:10])
          f_wc = ""
          if len(chk_wc) > 1:
            for t_wc in chk_wc:
              if not f_wc:
                f_wc = t_wc[2]
          elif len(chk_wc) == 1:
            f_wc = chk_wc[0][2]
          if chk_wc and f_wc != "":
            continue
          else:
            if chk_wc:
              d_updt.append((f"{r_wc[1]}", f"{chk_wc[0][0]} {chk_wc[0][1]}"))
              # print(f"""\t-> UPDATE n_tbl SET condition='{r_wc[1]}' WHERE datetime='{chk_wc[0][0]} {chk_wc[0][1]}'""")
            else:
              if len(r_wc[0]) == 16:
                d_nsrt.append((r_wc[0], None, None, None, r_wc[1], None))
                # print(f"""\t-> INSERT INTO {n_tbl} VALUES (NULL, '{r_wc[0]}', NULL, NULL, NULL, '{r_wc[1]}', NULL);""")
              else:
                d_nsrt.append((f"{r_wc[0]} 06:00", None, None, None, r_wc[1], None))
                # print(f"""\t-> INSERT INTO {n_tbl} VALUES (NULL, '{r_wc[0]} 06:00', NULL, NULL, NULL, '{r_wc[1]}', NULL);""")

        # take care of last month
        if d_nsrt:
          print("list of table inserts")
          for wc in d_nsrt:
            print(wc)
        if d_updt:
          print("\nlist of table updates")
          for wc in d_updt:
            print(wc)

And, for the 2015 condition CSV, the above produced the following in the terminal.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py

 2015.01.31 R:\learn\dashboard\data\wcond_2015.csv

list of table inserts
('2015.01.10 06:00', None, None, None, 'cloudy', None)
('2015.01.15 06:00', None, None, None, 'cloudy', None)

list of table updates
('mostly cloudy', '2015.01.06 08:30')
('rain', '2015.01.11 06:00')
('fog', '2015.01.14 09:00')
('cloudy', '2015.01.17 10:21')
('mainly clear', '2015.01.19 08:20')
('clear', '2015.01.20 09:36')
('mostly cloudy', '2015.01.21 08:38')
('fog', '2015.01.25 09:44')
('partly cloudy', '2015.01.26 09:39')
('cloudy', '2015.01.28 07:50')
('partly cloudy', '2015.01.30 08:05')

And that appears to match the proposed queries above.

Fini

I think that’s it for this post. I am not quite ready to tackle updating the weather conditions in the database. Want to generate some more weather condition update CSV files. Then run the above against them to see if any edge cases crop up.

Until next time, I do believe us beginners really need to test our code as thoroughly as possible before committing to modifying any data storages.