Okay, decided I didn’t want to, at present, spend the time and energy to try using a knowledge graph as a data source for a RAG chatbot. I have, instead, decided to take another side trip. That poor transformer model is going to stay forgotten for a while longer. At the moment I have no idea how much longer. As they say, time will tell.

What I would like to do is create a dashboard for displaying some of the weather related info (temperature first thing in the morning, rainfall amounts and maybe more) that I may have intermittently recorded over the last few to several years. Not really sure why, but…

I will likely use Streamlit to build the dashboard. Seems like that is simpler than generating the HTML and CSS myself. The data is mainly in PHP files with differing formats over the years. A good hunk of the coding will simply be extracting the data and putting it into a database. So likely a number of modules. At least one for the data extraction and a separate one for the dashboard itself.

As I write this, I don’t know if I will try to involve an LLM (local or on the internet; if local I will move development to the Beast). Nor do I know exactly what I want to display on the dashboard. Likely total rain by month, plus to-date for the current month. Perhaps daily for the current month. Maybe averages, for the recorded data, for each month. Maybe…

As for the database, I expect I will start with SQLite. But as I progress I may switch to PostgreSQL.

New Workspace

A new directory (?\learn\dashboard), a new git repository, a new github repository (private), a new conda environment, an initial Python module (data2db.py),…

Once the local repository was set up I added a .gitignore and README.md and committed them.

(base) PS R:\learn\dashboard> git init
Initialized empty Git repository in R:/learn/dashboard/.git/
... ...
(base) PS R:\learn\dashboard> git remote add origin git@github.com:XXX/XXX.git
... ...
(base) PS R:\learn\dashboard> git branch -M main
... ...
(base) PS R:\learn\dashboard> git push -u origin main
... ...
Branch 'main' set up to track remote branch 'main' from 'origin'.

(base) PS R:\learn\dashboard>conda create -n dbd-3.13 python streamlit pandas plotly
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

There were 137 packages installed!!

Quick Test

Okay, let’s create a list of all the viable data sources. There are two primary source directories. And two, more or less, file formats.

# data2db.py: module containing code to extract selected weather related data
#   add to csv, then create database

import sqlite3
import time
from pathlib import Path
import pandas as pd


def get_rg_dsrcs():
  # get list of all data sources
  d_srcs = []
  # set of rain data srcs
  dir_1 = Path("F:/BaRKqgs/home/binc")
  yrs_1 = list(range(2019, 2022))
  for yr in yrs_1:
    d_srcs.append(dir_1/f"rainGauge_{yr}.inc.php")
  # daily gdn pages with temp/precipitation
  dir_2  = Path("F:/BaRKqgs/gdn")
  yrs_2 = list(range(2015, 2026))
  for yr in yrs_2:
    d_srcs.append(dir_2/f"bark_gdn_{yr}.php")
  return d_srcs


if __name__ == "__main__":
  # will eventually need this
  cwd = Path(__file__).cwd()
  print(f"cwd: {cwd}\n")
  
  # let's have a look at that list of data sources
  d_srcs = get_rg_dsrcs()
  print(d_srcs)

Well, bit of a problem. Had this same problem in another environment, but the few notes I made at the time did not cover how I resolved the issue. Pretty sure the approach I took last time is not the same as I tried, successfully, this time.

dbd-3.13) PS R:\learn\dashboard> python -c "import pandas as pd; print(pd.__version__)"
INTEL oneMKL ERROR: The specified module could not be found. mkl_intel_thread.2.dll.
Intel oneMKL FATAL ERROR: Cannot load mkl_intel_thread.2.dll.

(dbd-3.13) PS R:\learn\dashboard> conda remove mkl
Collecting package metadata (repodata.json): done
Solving environment: /
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/win-64::libblas==3.9.0=28_h576b46c_mkl
  - conda-forge/win-64::libcblas==3.9.0=28_h7ad3364_mkl
  - conda-forge/win-64::liblapack==3.9.0=28_hacfb0e4_mkl
  - conda-forge/win-64::numpy==2.3.1=py313ha14762d_0
  - conda-forge/win-64::pandas==2.3.1=py313hc90dcd4_0
  - conda-forge/noarch::pydeck==0.9.1=pyhd8ed1ab_0
  - conda-forge/noarch::streamlit==1.46.1=pyhd8ed1ab_0
done
## Package Plan ##

  environment location: E:\appDev\Miniconda3\envs\dbd-3.13

  removed specs:
    - mkl

The following NEW packages will be INSTALLED:

  libhwloc           conda-forge/win-64::libhwloc-2.11.2-default_ha69328c_1001
  libiconv           conda-forge/win-64::libiconv-1.18-h135ad9c_1
  libxml2            conda-forge/win-64::libxml2-2.13.8-h442d1da_0
  tbb                conda-forge/win-64::tbb-2021.13.0-h62715c5_1

The following packages will be UPDATED:

  mkl                                   2020.4-hb70f87d_311 --> 2023.2.0-h6a75c08_49573

Proceed ([y]/n)? y

Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Once that was complete, the module executed successfully.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py
cwd: R:\learn\dashboard

[WindowsPath('F:/BaRKqgs/home/binc/rainGauge_2019.inc.php'), WindowsPath('F:/BaRKqgs/home/binc/rainGauge_2020.inc.php'), WindowsPath('F:/BaRKqgs/home/binc/rainGauge_2021.inc.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2015.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2016.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2017.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2018.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2019.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2020.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2021.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2022.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2023.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2024.php'), WindowsPath('F:/BaRKqgs/gdn/bark_gdn_2025.php')]

Extracting Data for the rainGauge Files

The format of the bark_gdn files is significantly different and will need to be, eventually, approached as such.

Well this isn’t really any where near the final code. I just want to look at how I might traverse and process the first file in the list above. I will be taking small steps and refactoring as I go.

Those first three files, rainGauge_{year}.inc.php, were files I included on the home page of our local web site to display the rainfall: totals for past months, daily for the current month. It is local in the sense that it is provided only on the in-house network by an instance of Apache running on my pc. The year in the file name is actually the year of the last set of data in the file. The first file has rain gauge readings from 2014.03.07 to 2019.12.31. The other two also include the last year from the previous file. So the 2020 file has the data for 2019.01.01 to 2020.12.31. The 2021 file has data for 2020.01.01 to 2021.04.30. After that I started recording rainfall in my daily note pages (the bark_gdn_{year}.php files).

Let’s open that first file and get the data for the first month in the file. Which happens to be March, 2014. I really should have used a sensible CSV file format for this stuff. But, I am notoriously slow witted.

I am going to, for now, also process the line that starts each new month. But, in fact, down the road that won’t be necessary as all the data we want are in the lines that specifically record a measurable amount of rainfall. To start with I will display the labels for the first two months and the rows of rainfall data for the first month.

Also, doing this the hard way as I didn’t feel like messing with regular expressions. I commented out the line printing the list of source files and the CWD.

  with open(d_srcs[0], "r") as ds:
    for ndx, ln in enumerate(ds):
      # get rid of the \n at the end of each line
      ln = ln.strip()
      # this will also need some refactoring down the road
      if d_srcs[0].name[:5] == "rainG":
        # find month/year, won't need this later
        if len(ln) > 15:
          if ln[4:8] == c_my[-4:]:
            r_dys.append(ln)
        # get rainfall data rows for current month
        if len(ln) == 15 and ln[:4] == "<li>":
          c_my = ln[4:-1]
          if p_my != c_my:
            if p_my != "":
              do_brk = True
            if r_dys:
              for dy in r_dys:
                print(dy)
            print(f"\n{c_my}")
            p_my = c_my
            if do_brk:
              break

And, in the terminal I got the following.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py

March 2014
<br>2014.03.09 13:30: 37.5mm precip collected. Didn't empty it.
<br>2014.03.11: no change, emptied gauge.
<br>2014.03.17 11:11: 54mm, emptied. So ~91.5 mm last 10 days or so.
<br>2014.03.20 pm: 21mm, emptied. Total: 112.5 mm
<br>2014.03.23 pm: 3mm, emptied. Total: 115.5 mm
<br>2014.03.26 pm: ~19.75mm. Total: 135.25 mm
<br>2014.03.29 11:14: ~33mm. Total 168.25 mm
<br>2014.03.30 ~16:00: ~6mm. Total 174.25 mm

April 2014

Clearly going to be an issue or two when extracting the relavent data from the selected data rows. We will also need to make sure we don’t record the data for the duplicated years in the files. Not quite sure how I am going to deal with that—as I will likely not process all the files at one sitting.

Generator

I don’t want to go through all the lines in each file, just those that may actually contain data I wish to extract and save. So, I am going to use a generator to give me just the lines in which I might be interested. Well, at least I hope so. All in all pretty simple.

def get_rg_data(rg_fl):
  '''Generator for raingauge php files, only want certain lines so decided use generator
     Parameters:
      rg_fl: full path to file
  '''
  # d_yr = rg_fl.name[10:14]
  # print(d_yr)
  with open(rg_fl, "r") as ds:
    for ln in ds:
      # print(ndx)
      ln = ln.strip()
      if ln[:4] == "<br>" and ln[8] == "." and ln[11] == '.' and ln[14] != ":":
        yield ln

And a bit of a test.

if __name__ == "__main__":
  tst_1 = False
  tst_2 = True
  tst_3 = False

... skipped code ...

# set up generator for first data file
  gen_d = get_rg_data(d_srcs[0])

  # get a few rows to confirm generator works as expected, though that is hardly proof
  if tst_2:
    print(next(gen_d))
    print(next(gen_d))
    print(next(gen_d))
    print(next(gen_d))
    exit(0)

In the terminal I got the following.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py
<br>2014.03.09 13:30: 37.5mm precip collected. Didn't empty it.
<br>2014.03.17 11:11: 54mm, emptied. So ~91.5 mm last 10 days or so.
<br>2014.03.20 pm: 21mm, emptied. Total: 112.5 mm
<br>2014.03.23 pm: 3mm, emptied. Total: 115.5 mm

At least for the first four iterations, got what I wanted.

Data Parsing

I had thought I could pretty simply use string searches to get what I want from each data line. But, I have decided to have a first go using a regex or two.

Looking at the lines above, if I want to capture the running total for the month, I would likely need two regexs. But, if I just get the date, time and measured rainfall then calculate the running total as I go, one regex should suffice. Let’s give that a go.

... ...
import time, re
... ...
def parse_rg_row(d_rw):
  rgx = r"^<br>(\d{4}\.\d{2}\.\d{2}) ~?(.*?): [<~]?(.*?) ?mm"
  rx = re.compile(rgx, re.IGNORECASE )
  return rx.match(d_rw)
... ...
  tst_2 = False
  tst_3 = True
... ...
  # try regex to parse two months worth of data lines
  if tst_3:
    c_mn, m_tot = "", 0
    for _ in range(17):
      d_rw = next(gen_d)
      mtch = parse_rg_row(d_rw)
      d_dt = mtch.group(1)
      # print(type(mtch.group(3)))
      print(f"{d_rw} -> {mtch.group(1)}, {mtch.group(2)}, {mtch.group(3)}", end="")
      if not c_mn:
        c_mn = d_dt[5:7]
        # print(f"{d_dt}: {d_dt[5:7]} -> {c_mn}")
      if d_dt[5:7] != c_mn:
        m_tot = float(mtch.group(3))
        c_mn = d_dt[5:7]
        # print(c_mn)
      else:
        m_tot += float(mtch.group(3))
      print(f", tot: {m_tot}")

    exit(0)

Do note there was a bit of debugging I am not going to mention. Suffice it to say, that I blindly attempted to fix it, without really looking for my error (f-string syntax, copy/paste error, instead of function call?!).

And, the above code output the following.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py
<br>2014.03.09 13:30: 37.5mm precip collected. Didn't empty it. -> 2014.03.09, 13:30, 37.5, tot: 37.5
<br>2014.03.17 11:11: 54mm, emptied. So ~91.5 mm last 10 days or so. -> 2014.03.17, 11:11, 54, tot: 91.5
<br>2014.03.20 pm: 21mm, emptied. Total: 112.5 mm -> 2014.03.20, pm, 21, tot: 112.5
<br>2014.03.23 pm: 3mm, emptied. Total: 115.5 mm -> 2014.03.23, pm, 3, tot: 115.5
<br>2014.03.26 pm: ~19.75mm. Total: 135.25 mm -> 2014.03.26, pm, 19.75, tot: 135.25
<br>2014.03.29 11:14: ~33mm. Total 168.25 mm -> 2014.03.29, 11:14, 33, tot: 168.25
<br>2014.03.30 ~16:00: ~6mm. Total 174.25 mm -> 2014.03.30, 16:00, 6, tot: 174.25
<br>2014.04.04 15:11: 14mm. Month total 14 mm -> 2014.04.04, 15:11, 14, tot: 14.0
<br>2014.04.06 ~16:00: 12.5 mm. Month total 26.5 mm -> 2014.04.06, 16:00, 12.5, tot: 26.5
<br>2014.04.07 ~15:45: 4.75 mm. Month total 31.25 mm -> 2014.04.07, 15:45, 4.75, tot: 31.25
<br>2014.04.09 ~13:30: ~1.5mm. Month total 32.75 mm -> 2014.04.09, 13:30, 1.5, tot: 32.75
<br>2014.04.18 ~17:00: ~53mm, prev 2 days. Month total 85.75mm -> 2014.04.18, 17:00, 53, tot: 85.75
<br>2014.04.20 ~09:00: ~3.5mm last 24 hours. Month total 89.25mm -> 2014.04.20, 09:00, 3.5, tot: 89.25
<br>2014.04.23 ~14:30: ~3mm last 24 hours. Month total 92.25mm -> 2014.04.23, 14:30, 3, tot: 92.25
<br>2014.04.25 ~14:30: ~17mm last 24 hours or so. Month total 109.25mm -> 2014.04.25, 14:30, 17, tot: 109.25
<br>2014.04.29 ~13:00: ~7mm. Month total 116.25mm -> 2014.04.29, 13:00, 7, tot: 116.25
<br>2014.05.05 ~13:15: ~54mm; just under 50mm in less than 44 hrs. Month total 54mm -> 2014.05.05, 13:15, 54, tot: 54.0

And all those values match what I have in the file for those dates—got to love that.

Note: after writing this post, during a subsequent refactoring I decided to move the above raingauge file related functions into a separate module in the utils directory. And, import them into the data2db.py module. I removed the function definitions from the original module.

from utils.rg_data import get_rg_dsrcs, get_rg_data, parse_rg_row

Save to CSV Files

For now I am going to save the parsed data to a CSV file. Though perhaps I should be saving it directly to the database. That said there are a number of ways to load a CSV file into a SQLite3 database, so for now I will build a CSV file.

Dates Parsed and Recorded

But first I am going to create and maintain another CSV file. It will be used to track the dates for which rainfall data has been parsed and recorded (in the CSV file mentioned above). Each row, 3 fields, will look something like:

year,month,"day1,...,dayN"

I am going with a string for the third field as the number of days will vary for each year and month.

And, I am going to write my own class to handle this CSV file. I want to have an object filled with the data currently in the CSV file, to which I can add new dates, that I can search with the latest data available and which I can use to update the file on disk. I couldn’t really see a great way to do this with the csv module alone, or with Numpy or Pandas. I plan to use a dictionary for my underlying data object.

The plan is that the key will be f"{year} {month}". The value will be a set of the days in the string of days for that key (probably faster than searching a string of comma separated numbers/strings).

This will be a bit lengthy as I am going to show the development in small steps. Let’s start with the class __init__. There was a bit of refactoring of class objects, I am going to show you the most recent state. Hopefully comments are enough for you to follow what I am doing.

The class will be in a separate module, dates_dn.py, in a separate directory, utils, below the directory which houses my data2db module. So added an __init__.py to both directories.

# dates_dn.py: module for class Dates_done
#  ver: 0.1.0: 2025.07.15, rek, init version, copied over class code from data2db.py

import csv

class Dates_done():
  def __init__(self, f_pth):
    """ Create a dictionary of rainfall dates processed from a CSV like file.
        Keep some additional information to help sort things down the line
      params:
        f_pth: path to file from CWD, pathlib format
    """
    self.fl = f_pth   # path to CSV file
    self.dts = {}     # dictionary for data in csv file
    self.c_init = {}  # count of dates for each year/month when csv file read
    self.c_new = {}   # updated counts if dates added to dictionary

    # if file exists, read it in to dict
    # use "year month" as key and set of days as value
    if f_pth.exists():
      with open(f_pth, mode="r", newline="") as fl:
        csv_rdr = csv.reader(fl, delimiter=',', quotechar='"')
        # skip header row
        next(csv_rdr)
        for row in csv_rdr:
          r_ky = f"{row[0]} {row[1]}"
          self.dts[r_ky] = set(row[2].split(","))
          self.c_init[r_ky] = len(self.dts[r_ky])
      self.c_new = self.c_init.copy()

A wee test in in the if __name__ == "__main__" block. I manually created a partially populated CSV file for the test.

if __name__ == "__main__":
  # simple tests of implementation of Dates_done class

  # instantiate class using test csv file
  cwd = Path(__file__).cwd()
  fl_pth = cwd/"../data"
  fl_nm = "dates_done.csv"
  dd_pth = fl_pth/fl_nm

  # instantiate class
  dts_dn = Dates_done(dd_pth)
  # check class objects
  print(f"reading from {dd_pth}")
  print(f"cnt dts: {dts_dn.c_init}")
  print(f"{dts_dn.dts}")

And, in the terminal the following was displayed. Note, I left out the last date in April, 2014.04.29.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py
reading from data\dates_done.csv
cnt dts: {'2014 03': 7, '2014 04': 8}
{'2014 03': {'20', '09', '23', '29', '26', '30', '17'}, '2014 04': {'25', '20', '18', '04', '23', '07', '06', '09'}}

Let’s add a search method to look for a specific date in the dictionary.

  def dts_srch(self,s_dt):
    """search self.dts dict for specified date
      params:
        s_dt: date to search for, format yyyy.mm.dd
    """
    fnd = False
    sy, sm, sd = s_dt.split(".")
    s_ky = f"{sy} {sm}"
    if s_ky in self.dts.keys() and sd in self.dts[s_ky]:
      fnd = True
    return fnd

Another wee test.

... ...
  do_srch = True
... ...
  if do_srch:
    # test class search method
    print(f"\nsearching done dates:\n{dts_dn.dts}")
    dts2srch = ["2014.04.18", "2014.05.05"]
    print()
    for sdt in dts2srch:
      print(f"{sdt} found: {dts_dn.dts_srch(sdt)}")

And in the terminal:

searching done dates:
{'2014 03': {'17', '30', '26', '29', '09', '20', '23'}, '2014 04': {'18', '25', '06', '04', '07', '09', '20', '23'}}

2014.04.18 found: True
2014.05.05 found: False

Okay, let’s look at adding a new date.

  def add_dt(self,n_dt):
    """add new date to self.dts dict
      params:
        n_dt: date to search for, format yyyy.mm.dd
      returns:
        nothing
      raise:
        KeyError if date already in dictionary
      
    """
    ny, nm, nd = n_dt.split(".")
    n_ky = f"{ny} {nm}"
    # if date already in dict, bail
    if self.dts_srch(n_dt):
      raise(KeyError("duplicate date"))
    # existing key or new key, either way update new date count
    if n_ky in self.dts:
      self.dts[n_ky].add(nd)
      self.c_new[n_ky] = len(self.dts[n_ky])
    else:
      # need to pass iterable containing string so that set() doesn't split string
      self.dts[n_ky] = set({nd,})
      self.c_new[n_ky] = len(self.dts[n_ky])

Another test. I also wanted to see if I could check if and where the counts were updated. Figured that might help me decide if and how I should update the CSV file.

... ...
  do_srch = False
  do_add_dt = True
... ...
  # test adding a new dates, confirm they are there
  # and if new year/month confirm count updated
  if do_add_dt:
    print(f"test adding dates to class member dictionary")
    dts2add = ["2014.04.18", "2014.04.29", "2014.05.05", "2014.05.11"]
    for adt in dts2add:
      print(f"\nadding {adt}:")
      try:
        dts_dn.add_dt(adt)
      except KeyError:
        print(f"{adt} already in dts_dn object!")
      else:
        print(f"cnt dts: {dts_dn.c_new}")
        print(f"{dts_dn.dts}")
    # let's check out the two data count dicts
    # will need this info when deciding how to update the csv file
    # I was planning on just doing an append
    print(f"\nc_init == c_new: {dts_dn.c_init == dts_dn.c_new}")
    common_items = set(dts_dn.c_init.items()) & set(dts_dn.c_new.items())
    diff_items = set(dts_dn.c_new.items()) - set(dts_dn.c_init.items())
    print(f"Common items: {common_items}")
    print(f"Items in c_new but not in c_init: {diff_items}")
    print(f"len(common_items) == len(c_init): {len(common_items) == len(dts_dn.c_init)}")

And the output was as follows.

test adding dates to class member dictionary

adding 2014.04.18:
2014.04.18 already in dts_dn object!

adding 2014.04.29:
cnt dts: {'2014 03': 7, '2014 04': 9}
{'2014 03': {'17', '29', '09', '30', '23', '26', '20'}, '2014 04': {'07', '09', '29', '04', '06', '23', '25', '18', '20'}}

adding 2014.05.05:
cnt dts: {'2014 03': 7, '2014 04': 9, '2014 05': 1}
{'2014 03': {'17', '29', '09', '30', '23', '26', '20'}, '2014 04': {'07', '09', '29', '04', '06', '23', '25', '18', '20'}, '2014 05': {'05'}}

adding 2014.05.11:
cnt dts: {'2014 03': 7, '2014 04': 9, '2014 05': 2}
{'2014 03': {'17', '29', '09', '30', '23', '26', '20'}, '2014 04': {'07', '09', '29', '04', '06', '23', '25', '18', '20'}, '2014 05': {'05', '11'}}

c_init == c_new: False
Common items: {('2014 03', 7)}
Items in c_new but not in c_init: {('2014 04', 9), ('2014 05', 2)}
len(common_items) == len(c_init): False

And, finally, for now, let’s look at updating the CSV file on disk with the modified dictionary of dates processed so far.

I was going to look at handling two cases:

  • a straight append if no date added to any of the year/month combinations in the CSV file loaded by the class on instantiation
  • a file rewrite if a date had been added to one or more existing year/month combinations in the initial file

In the end I decided just to append any year/month combos in the diff_items set. The way I build the dictionary in the __init__ method ensures the new data will overwrite the prior data. And, as long as the file doesn’t become enormous that should be okay. Will probably use a database table for this in the long run. But for the initial developmental purposes will use the CSV file concept.

  def updt_csv_file(self):
    # make sure there is a reason to update
    if self.c_init == self.c_new:
      return
    # because of the way I build the dictionary of dates, going to always
    # append changed rows (at least for now)
    n_rws = []
    for cnt in diff_items:
      n_ky, _ = cnt
      n_yr, n_mn = n_ky.split(" ")
      n_dts = sorted(self.dts[n_ky])
      n_rws.append([n_yr,n_mn,",".join(n_dts)])
    with open(self.fl, mode="a", newline="") as c_fl:
      csv_wrtr = csv.writer(c_fl)
      csv_wrtr.writerows(n_rws)
  # let's update the csv file on disk
  dts_dn.updt_csv_file()

  # let's check that worked as expected
  dts_dn = Dates_done(dd_pth)
  print(f"\ncnt dts: {dts_dn.c_new}")
  print(f"{dts_dn.dts}")

No terminal output from the method call. Perhaps that is a bad idea!

cnt dts: {'2014 03': 7, '2014 04': 9, '2014 05': 2}
{'2014 03': {'17', '09', '23', '20', '29', '30', '26'}, '2014 04': {'09', '23', '06', '20', '29', '18', '04', '25', '07'}, '2014 05': {'11', '05'}}

Exactly what we had above after adding a few new dates. And, the file now contains the following. Also as expected. Notice the two rows for 2014,04.

Year,Month,Days
2014,03,"09,17,20,23,26,29,30"
2014,04,"04,06,07,09,18,20,23,25"
2014,04,"04,06,07,09,18,20,23,25,29"
2014,05,"05,11"

For reference, the initial contents of the test CSV file before we added the above dates was:

Year,Month,Days
2014,03,"09,17,20,23,26,29,30"
2014,04,"04,06,07,09,18,20,23,25"

So, the class appears to work as intended. But I am sure there are any number of edge cases I have not tested.

Done M’thinks

Well I think that’s it for this post. Plenty long enough for my liking.

Next time I will look at writing the code to actually generate the CSV file with the rainfall data. And keeping track of the parsed dates with the class written in this post.

Though I have no idea why I am doing this project, I have to admit I quite enjoyed the last two days of coding. I just flowed along with the coding process. No expectations, no needs, no disappointment with bugs,… Similarly writing this post.

Until next time, may you find a happy place when coding your projects.