Okay, I have the four daily note files parsed as best I can. Lots of junk in all of them because of the way I did things. Going to sort out parsing them and saving the data to the appropriate database table. Along with the extra bits of information in CSV files and/or other database tables.

Will start working with the 2022 notes file. Once I have things working I will likely combine all the files into one single file. Restore the database to its finished 2021 state. And process the combined file in one go. Similar to what I did with the first set of rainfall files.

Refactor File Generator

Okay, new file format, so have decided to refactor rg_data.get_rg_data() to handle it as well.

def get_rg_data(rg_fl, f_typ="rg"):
  '''Generator for raingauge php files, only want certain lines so decided use generator
     Parameters:
      rg_fl: full path to file
      f_typ: which file version is file
        'rg' = rain gauge style,
        'dn' = daily note style,
        'dna' = daily note style, but less restrictive than 'dn'
        'dnp' = parsed daily note file, contains rainfall data along with misc other note data
  '''
  with open(rg_fl, "r", encoding='utf8') as ds:
... ...
      if f_typ in ["dn", "dna", "dnp"] :
... ...
        elif f_typ == 'dnp':
          if not ln.startswith(("no match", "e-gauge")) and len(ln) > 10:
            do_yld = True
        if do_yld:
          if f_typ != 'dnp':
            ln = f"{c_dt} {ln}"
          else:
            if ln.startswith("tube"):
              ln = ln[6:]

New Development/Test Loop

As is my habit, a new if loop in some module or other for the development/test code. In this case I am going to stick with the data2db module.

To start with I am going to get a list of file paths (pathlib) to the four new files I created. Thought about updating rg_data.get_rg_dsrcs(); but decided against it. For now, really only a one time use.

Setup Loop

... ...
  if do_dn:
    chk_i_rf = False
    proc_fl_1 = False
    proc_fl_2 = False
    c_src = 11
    dn_rf_2db = True
... ...
    if dn_rf_2db:
      # process the data in the parsed daily notes file month by month
      # add daily rainfall too database, record dates done
      
      # for now only going to look at 2022 file
      c_dnp = 0
      # get list of all data sources
      d_srcs = []
      cwd = Path(__file__).cwd()
      d_pth = cwd/"data"
      s_yrs = list(range(2022, 2026))
      for yr in s_yrs:
        d_srcs.append(d_pth/f"dn_{yr}_parse.txt")
      print(d_srcs)
      print(f"{d_srcs[c_dnp].name} exists: {d_srcs[c_dnp].exists()}, is file: {d_srcs[c_dnp].is_file()}")

And in the terminal, I got the following.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py
[WindowsPath('R:/learn/dashboard/data/dn_2022_parse.txt'), WindowsPath('R:/learn/dashboard/data/dn_2023_parse.txt'), WindowsPath('R:/learn/dashboard/data/dn_2024_parse.txt'), WindowsPath('R:/learn/dashboard/data/dn_2025_parse.txt')]
dn_2022_parse.txt exists: True, is file: True

Okay, now let’s set up the file generator. Then parse and print a number of the lines we get back. I am going for 28 iterations. The generator should be returning strings that look like this: 2022.01.02 08:00: 2.26 mm, 2.26. But immediately ran into errors. So put my code in a try/except block. Fixed errors as they came up. Eventually did that for every file for every line returned.

... ...
      print(f"Processing file: {d_srcs[c_dnp].name}\n")
      # set up generator for current data file
      gen_d = get_rg_data(d_srcs[c_dnp], f_typ="dnp")

      # let's get and parse the first 5 lines yeilded
      # make sure we get numbers where needed
      for i in range(28):
        d_rw = next(gen_d)
        try:
          rf_dt, rf_tm, d_rf, j_rf, m_rf = d_rw.split(" ")
          rf_tm = rf_tm[:-1]
          d_rf = float(d_rf)
          m_rf = float(m_rf)
        except ValueError:
          print(f"\nerror: {d_rw}\n")
        else:
          print(f"{rf_dt} {rf_tm}: {d_rf} {j_rf[:-1]} -> {m_rf}")

And in the terminal I, finally, got the following. Which, for better or worse, does match what’s in the file.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py
Processing file: dn_2022_parse.txt

2022.01.02 08:00: 2.26 mm -> 2.26
2022.01.03 08:00: 3.39 mm -> 5.65
2022.01.04 08:00: 1.58 mm -> 7.23
2022.01.05 08:00: 9.26 mm -> 16.48
2022.01.06 19:00: 24.84 mm -> 41.32
2022.01.07 08:00: 14.23 mm -> 55.55
2022.01.08 08:00: 14.23 mm -> 69.77
2022.01.09 08:00: 14.23 mm -> 84.0
2022.01.10 08:00: 4.87 mm -> 99.87
2022.01.11 08:00: 77.92 mm -> 166.78
2022.01.12 08:00: 24.35 mm -> 191.12
2022.01.30 08:00: 11.9 mm -> 203.02
2022.01.31 08:00: 17.8 mm -> 220.82
2022.02.03 08:00: 6.8 mm -> 6.8
2022.02.04 08:00: 12.5 mm -> 19.3
2022.02.05 08:00: 4.3 mm -> 23.6
2022.02.08 08:00: 4.5 mm -> 28.1
2022.02.09 08:00: 4.3 mm -> 32.4
2022.02.11 08:00: 0.3 mm -> 32.7
2022.02.14 08:00: 6.4 mm -> 39.1
2022.02.15 08:00: 1.2 mm -> 40.3
2022.02.19 08:00: 1.8 mm -> 42.1
2022.02.20 08:00: 7.1 mm -> 49.2
2022.02.21 08:00: 0.5 mm -> 49.7
2022.02.25 08:00: 0.8 mm -> 50.5
2022.02.27 08:00: 14.7 mm -> 65.2
2022.02.28 08:00: 30.7 mm -> 95.9
2022.02.28 24:00: 5.4 mm -> 101.3

Rainfall Processing Loop

Okay, let’s move on and start working on the code to take that data and get it into the database table(s). As it is so straightforward to parse the data, I think I will just continue to do so with code in the loop. I will for each month build a list of tuples, which I will then use to add that month’s data to the database. Pretty much what we did with the earlier file type. I will also, as before, keep track of the dates processed. Ensuring I do not process a given date more than once. If an attempt is made I will post warning message to the terminal.

As this is still development mode, for now I won’t bother updating the monthly history table. Once I have things sorted and am processing the files for real, I will, once done, do so.

Okay have code that works for processing first two months of 2022. Going to refactor and test processing the whole 2022 file. Will show the refactored code below.

The if dn_rf_2db block has grown much larger. I expect I would have coded some functions if this was something that would be repeated regularly. But, that is not the case; so, some sloppy coding. And a little extra code added to the if chk_i_rf block.

I also ran into a problem with the dates done related code. The class was expecting a dictionary with keys in the form yyyy.mm. I was passing it mm. A new variable and some added code to fix that problem. Again, if this processing of months or years of data at one time was something that would be repeated in future, I would likely refactor the class in some way to simplify the processing loop code.

Here’s all the code new/revised code.

    if dn_rf_2db:
      # process the data in the parsed daily notes file month by month
      # add daily rainfall too database, record dates done
      
      # for now only going to look at 2022 file
      c_dnp = 0
      chk_i_rf = True
      # get list of all data sources
      d_srcs = []
      cwd = Path(__file__).cwd()
      d_pth = cwd/"data"
      s_yrs = list(range(2022, 2026))
      for yr in s_yrs:
        d_srcs.append(d_pth/f"dn_{yr}_parse.txt")
      
      print(f"Processing file: {d_srcs[c_dnp].name}\n")
      # set up generator for current data file
      gen_d = get_rg_data(d_srcs[c_dnp], f_typ="dnp")

      # let's get and parse the first 5 lines yeilded
      # make sure we get numbers where needed
      p_mn, c_mn = "", ""               # prev mon, curr mon
      rf_rws, m_dts = [], {}   # rainfall rows, tot mon, mon dates
      tot_rws = 0
      # for dev will use start and end month, no longer really need
      # these two variables, but I was using to init done_mn
      s_mn, e_mn = "01", "02"
      rf_yr = s_yrs[c_dnp]
      done_mn = f"{rf_yr}.{s_mn}"

      s_tm = time.perf_counter()
      for d_rw in gen_d:
        try:
          rf_dt, rf_tm, d_rf, j_rf, m_rf = d_rw.split(" ")
          rf_tm = rf_tm[:-1]
          dt_rf = f"{rf_dt} {rf_tm}"  # datetime
          d_rf = float(d_rf)          # daily rainfall
          m_rf = float(m_rf)          # month-to-date rainfall
        except ValueError:
          print(f"\nerror: {d_rw}\n")
        d_mn = rf_dt[5:7]

        # if just starting processing file, initialize variables
        if not c_mn:
          c_mn = d_mn
          m_dts[done_mn] = []
          print(f"\tcurrent month: {c_mn} -> {done_mn}")
        
        # if new month, save data, updt controls, etc.
        if rf_dt[5:7] != c_mn:
          num_rws = rfall.add_rainfall(rf_rws)
          tot_rws += num_rws
          dts_dn.add_month(m_dts)

          rf_rws = []
          p_mn, c_mn = c_mn, rf_dt[5:7]
          rf_yr = rf_dt[:4]
          done_mn = f"{rf_yr}.{c_mn}"
          m_dts = {done_mn: []}
          print(f"\tcurrent month: {c_mn} -> {done_mn}")

        # process current date's rainfall       
        rf_rws.append((dt_rf, d_rf, m_rf))
        m_dts[done_mn].append(rf_dt[-2:])
        
      # if rf_rws not empty, updt rainfall table/dates done list
      if rf_rws:
        # set things up for chk_i_rf block if full or partial year
        if c_mn == "12":
          p_mn = "12"
        else:
          p_mn = c_mn
        num_rws = rfall.add_rainfall(rf_rws)
        tot_rws += num_rws
        dts_dn.add_month(m_dts)
     
      # update dates done csv file
      dts_dn.updt_csv_file()

      # update monthly rainfall history table
      rfall.updt_hist_tbl()

      e_tm = time.perf_counter()
      print(f"{tot_rws} records added to database table")
      print(f"run time: {(e_tm - s_tm):.4f} secs")

  if chk_i_rf:
    print("\ndatabase table check")
    for c_tnm in rfall.tnms.values():
      rslt = rfall.get_tbl_rw_cnt(c_tnm)
      print(f"\nrows in {c_tnm} table: {rslt}")

      if c_tnm == "rainfall":
        nx_q = f"""SELECT substr(datetime, 1, 7) as c_month, MIN(daily) as minMonth, MAX(daily) as maxMonth
          FROM {c_tnm}
          GROUP BY c_month;"""
      elif c_tnm == "rf_mon_history":
        nx_q = f"SELECT month, avg FROM {c_tnm}"
      mn_data = rfall.qry_pd(nx_q)
      print("\n", mn_data)

    print(f"\nselecting rows where datetime like '{s_yrs[c_dnp]}.{p_mn}%'")
    nx_q = f"""SELECT * FROM rainfall WHERE datetime LIKE '{s_yrs[c_dnp]}.{p_mn}%';"""
    mn_data = rfall.qry_pd(nx_q)
    print(mn_data)

And in the terminal I got the following. Sorry a touch lengthy.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py
Processing file: dn_2022_parse.txt

        current month: 01 -> 2022.01
        current month: 02 -> 2022.02
        current month: 03 -> 2022.03
        current month: 04 -> 2022.04
        current month: 05 -> 2022.05
        current month: 06 -> 2022.06
        current month: 07 -> 2022.07
        current month: 08 -> 2022.08
        current month: 09 -> 2022.09
        current month: 10 -> 2022.10
        current month: 11 -> 2022.11
        current month: 12 -> 2022.12
155 records added to database table
run time: 0.8998 secs

database table check

rows in rainfall table: 1381

      c_month  minMonth  maxMonth
0    2014.03       3.0      54.0
1    2014.04       1.5      53.0
2    2014.05       5.0      54.0
3    2014.06       2.0      14.5
4    2014.07       3.0      23.0
..       ...       ...       ...
101  2022.08       5.5       5.5
102  2022.09       1.0       4.5
103  2022.10       1.0      53.0
104  2022.11       0.2      44.9
105  2022.12       0.3      31.0

[106 rows x 3 columns]

rows in rf_mon_history table: 12

    month         avg
0     01  259.990000
1     02  138.312500
2     03  175.455556
3     04  105.633333
4     05   85.073000
5     06   53.422222
6     07   24.988889
7     08   32.440000
8     09  112.533333
9     10  181.725556
10    11  290.437778
11    12  249.650000

selecting rows where datetime like '2022.12%'
    row_id          datetime  daily  monthly
0     1364  2022.12.03 08:00    2.0      2.0
1     1365  2022.12.07 08:00    5.8      7.8
2     1366  2022.12.08 08:00    3.3     11.1
3     1367  2022.12.09 08:00    3.0     14.1
4     1368  2022.12.10 08:00   15.0     29.1
5     1369  2022.12.11 08:00    9.9     39.0
6     1370  2022.12.14 08:00    0.5     39.5
7     1371  2022.12.15 08:00    0.3     39.8
8     1372  2022.12.18 08:00    1.0     40.8
9     1373  2022.12.18 08:00   11.0     51.8
10    1374  2022.12.20 08:00   31.0     82.8
11    1375  2022.12.23 12:00   10.0     92.8
12    1376  2022.12.24 08:00    9.9    102.7
13    1377  2022.12.25 08:00   24.4    127.1
14    1378  2022.12.26 08:00   16.0    143.1
15    1379  2022.12.27 08:00   14.2    157.3
16    1380  2022.12.28 08:00    4.9    162.2
17    1381  2022.12.31 08:00    8.8    171.0

Also, the history table data has changed from what it was before the added year, 2022, of data. I haven’t checked all the months, but the data displayed above for December 2022 is correct. So going to assume the other months of 2022 will also be correct.

Done

Not really finished adding the rest of the years, but I will do so year by year over the next day or two. That will save me from some additional refactoring and file combining. If I combine the files into a single processing run, I will need to account for the changing years from start to end of the file. By doing it one year at a time, the current code will handle that without any refactoring. (Yes, lazy me!)

And, there was another function/method I wanted to write. It would take as input the date, time, and rainfall amount for some given day, and write that to the database. This is something that the dashboard will need to do each day in order to keep things up-to-date.

Going to leave that for the next, probably very short, post.

Until then, I wish you much joy from the successes in your hard fought battles. I am still feeling that joy working on this project.