Going to continue playing with regular expressions. This time tackling the Regex Tuesday Challenge - Week Three. With a thumbs-up to Callum Macrae for creating the site.

The third regex tuesday challenge is to match dates in YYYY/MM/DD HH:MM(:SS) format. YYYY should be a year between 1000 and 2012, and everything else should be a valid month, date, hour, minute and second. The seconds should be optional. Don’t worry about leap years, and assume that all months have 30 days.

Regex Tuesday Challenge - Week Three

Once I get this working, I will likely try to check for the correct number of days in given months. With the exception of February; which I will assume can have 29 days.

Exercise Requirements

Date/Time Elements

I will probably generate individual regexes for each of the items in the date and use an f-string to produce the final regex. So, I will start with test cases for the individual date/time elements. Then use the challenges test cases for the final complete regex. Lots of fiddly arrays and/or dictionaries. But such is the coder’s life.

Year

Okay, the year must be between 1000 and 2012. I am going to assume inclusively seeing as how one of the test cases that should pass has a year of 2012. This again is going to be a set of alternative patterns. 1000-1999 is pretty straightforward. 2000-2012 requires a little more effort.

 (:?1\d{3}|200\d|201[012])

The code to test the above follows. But I am leaving out the code defining the test arrays.

import re
... ...
r_dt = r'^(:?1\d{3}|200\d|201[012])'

d_tst = {'tyr': ['Test Years', yr_ok, yr_no]}
c_tsts = ['tyr']

# add in the '/' separating the date values
rgx = re.compile(fr'{r_dt}\/')

for f_tst in c_tsts:
  print(f"\n{d_tst[f_tst][0]}")
  if d_tst[f_tst][1] is not None:
    print("\tThe following should all pass:")
    for tst in d_tst[f_tst][1]:
      r_tst = rgx.match(tst)
      print(f"\t\t{tst} -> {'valid' if r_tst else 'not valid'}")

  if d_tst[f_tst][2] is not None:
    print("\tThe following should all fail:")
    for tst in d_tst[f_tst][2]:
      r_tst = rgx.match(tst)
      print(f"\t\t{tst} -> {'valid' if r_tst else 'not valid'}")

And the resulting test output:

(g4p-3.11) PS R:\learn\regex_ai\blog> python datetime.py

Test Years
        The following should all pass:
                1000/ -> valid
                1012/ -> valid
                1950/ -> valid
                2000/ -> valid
                2001/ -> valid
                2012/ -> valid
        The following should all fail:
                893/ -> not valid
                999/ -> not valid
                2013/ -> not valid

Month

This is a fairly straightforward alternation pattern. I don’t think it needs any explanation

 (:?0[1-9]|1[012])

I have changed the test data to include year and month in the desired format. Here’s the changes to the code and my test results. The regex appears to work as desired.

r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'

d_tst = {'tym': ['Test Year/Month', ym_ok, ym_no]}
c_tsts = ['tym']

rgx = re.compile(fr'{r_dt}\/{r_mn}\/')
(g4p-3.11) PS R:\learn\regex_ai\blog> python datetime.py

Test Year/Month
        The following should all pass:
                1000/12/ -> valid
                1012/01/ -> valid
                1950/11/ -> valid
                2000/12/ -> valid
                2001/06/ -> valid
                2012/10/ -> valid
        The following should all fail:
                893/10/ -> not valid
                999/03/ -> not valid
                2013/11/ -> not valid
                1950/00/ -> not valid
                2012/13/ -> not valid
                1950/22/ -> not valid

Dates

Okay, keeping it with the stated approach that every month has 30 days we again have a alternation pattern. Similar to both of the above.

 (:?0[1-9]|[12]\d|30)

The altered code and tests follow. I am using the full test set from the exercise. And, for now ignoring the time section. So keep that in mind when looking at the test results.

r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'
r_dy = r'(:?0[1-9]|[12]\d|30)'

d_tst = {'tdt': ['Test Full Date', dtm_ok, dtm_no]}
c_tsts = ['tdt']

rgx = re.compile(fr'{r_dt}\/{r_mn}\/{r_dy} .*$')
(g4p-3.11) PS R:\learn\regex_ai\blog> python datetime.py

Test Full Date
        The following should all pass:
                2012/09/18 12:10 -> valid
                2001/09/30 23:59:11 -> valid
                1995/12/01 12:12:12 -> valid
                1001/01/07 14:27 -> valid
                2010/10/20 10:10 -> valid
                2000/01/01 01:01:01 -> valid
                2007/07/22 22:34:59 -> valid
                2010/05/05 00:00:00 -> valid
        The following should all fail:
                2012/9/18 23:40 -> not valid
                2013/09/09 09:09 -> not valid
                2012/00/01 01:49:59 -> not valid
                2012/13/25 22:17:00 -> not valid
                1994/11/00 12:12 -> not valid
                2012/12/4 12:12 -> not valid
                2009/11/11 24:00:00 -> valid
                2012/06/24 13:60 -> valid
                2002/10/10 14:59:60 -> valid
                a2011/11/11 11:11:11 -> not valid
                2005/05/05 05:05:05d -> valid

Now to the time section.

Hour

Okay the hour pattern will be different from that for the minutes and seconds (both of which are the same). But nothing we haven’t seen before. As you can see in the tests above, the hour uses the 24 hour clock. So, it can range from 00 to 23. Another alternation pattern.

 (:?[01]\d|2[0-3])

r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'
r_dy = r'(:?0[1-9]|[12]\d|30)'
r_hr = r'(:?[01]\d|2[0-3])'

d_tst = {'tdt': ['Test Full Date and Hour', dtm_ok, dtm_no]}
c_tsts = ['tdt']

rgx = re.compile(fr'{r_dt}\/{r_mn}\/{r_dy} {r_hr}\:.*$')
Test Full Date and Hour
        The following should all pass:
                2012/09/18 12:10 -> valid
                2001/09/30 23:59:11 -> valid
                1995/12/01 12:12:12 -> valid
                1001/01/07 14:27 -> valid
                2010/10/20 10:10 -> valid
                2000/01/01 01:01:01 -> valid
                2007/07/22 22:34:59 -> valid
                2010/05/05 00:00:00 -> valid
        The following should all fail:
                2012/9/18 23:40 -> not valid
                2013/09/09 09:09 -> not valid
                2012/00/01 01:49:59 -> not valid
                2012/13/25 22:17:00 -> not valid
                1994/11/00 12:12 -> not valid
                2012/12/4 12:12 -> not valid
                2009/11/11 24:00:00 -> not valid
                2012/06/24 13:60 -> valid
                2002/10/10 14:59:60 -> valid
                a2011/11/11 11:11:11 -> not valid
                2005/05/05 05:05:05d -> valid

Minutes and Seconds

Since these both in the range 0-59, the same pattern will work for both. And nice and simple it is.

 (:?[0-5]\d)

So, let’s test the full data and time. Remembering the seconds are optional.

r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'
r_dy = r'(:?0[1-9]|[12]\d|30)'
r_hr = r'(:?[01]\d|2[0-3])'
r_ms = r'(:?[0-5]\d)'

d_tst = {'tdt': ['Test Full Date and Time', dtm_ok, dtm_no]}
c_tsts = ['tdt']

rgx = re.compile(fr'{r_dt}\/{r_mn}\/{r_dy} {r_hr}\:{r_ms}(:?\:{r_ms})?$')
Test Full Date and Time
        The following should all pass:
                2012/09/18 12:10 -> valid
                2001/09/30 23:59:11 -> valid
                1995/12/01 12:12:12 -> valid
                1001/01/07 14:27 -> valid
                2010/10/20 10:10 -> valid
                2000/01/01 01:01:01 -> valid
                2007/07/22 22:34:59 -> valid
                2010/05/05 00:00:00 -> valid
        The following should all fail:
                2012/9/18 23:40 -> not valid
                2013/09/09 09:09 -> not valid
                2012/00/01 01:49:59 -> not valid
                2012/13/25 22:17:00 -> not valid
                1994/11/00 12:12 -> not valid
                2012/12/4 12:12 -> not valid
                2009/11/11 24:00:00 -> not valid
                2012/06/24 13:60 -> not valid
                2002/10/10 14:59:60 -> not valid
                a2011/11/11 11:11:11 -> not valid
                2005/05/05 05:05:05d -> not valid

Completed Week 3 Exercise

And the full regex passes all the test cases on the exercise page. Date and time patterns on separate lines for viewability.

 ^(:?1\d{3}|200\d|201[012])\/(:?0[1-9]|1[012])\/(:?0[1-9]|[12]\d|30)
 (:?[01]\d|2[0-3])\:(:?[0-5]\d)(:?\:(:?[0-5]\d))?$

Enhance Day Pattern

Okay, let’s try to sort checking for a valid day value. I am going to try and combine lookbehinds with appropriate day patterns for the affected months. Every month can have at least 29 days. I am not going to try and sort February and leap years. Even though it is possible, the regex would become very large given the range of years involved. Other than February the remaining months can all have 30 days. And some of them can have 31 days.

So, I am thinking, the first alernation is simply allowing for a day value of 01-29. Simple enough. Then, if the current month is not February (02), allow for 30 days. Then check for months that can have 31 days. In my development (not in the post), I will add each of those a step at a time. Testing as I go. A new set of tests for only the date. No need to play with the time pattern any longer.

So, the first pattern looks like the following; which we have pretty much seen before.

 (:?0[1-9]|[12]\d)

Now for the 30 day case, we need to make sure the month is not February. So lookbehind making sure the 3 characters are not 02/

 (:?(?<!02\/)30)

And finally all the 31 day months. Bit of pain that one.

 (:?(?<=(?:01\|03|05|07|08|10|12)\/)31)

And the whole thing:

 (:?(:?0[1-9]|[12]\d)|(:?(?<!02\/)30)|(:?(?<=(?:01\|03|05|07|08|10|12)\/)31))

And, my tests.

r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'
r_dy = r'(:?(:?0[1-9]|[12]\d)|(:?(?<!02\/)30)|(:?(?<=(?:01|03|05|07|08|10|12)\/)31))'

d_tst = {'tdt': ['Test Days Correct for Month', dt_ok, dt_no]}
c_tsts = ['tdt']

rgx = re.compile(fr'{r_dt}\/{r_mn}\/{r_dy} $')
Test Days Correct for Month
        The following should all pass:
                2012/01/31  -> valid
                2001/02/29  -> valid
                1995/03/31  -> valid
                1001/04/30  -> valid
                2010/05/31  -> valid
                2000/06/30  -> valid
                2007/07/31  -> valid
                2010/08/31  -> valid
                2010/09/30  -> valid
                2010/10/31  -> valid
                2010/11/30  -> valid
                2010/12/31  -> valid
        The following should all fail:
                1950/01/00  -> not valid
                2012/01/32  -> not valid
                2013/02/30  -> not valid
                2012/04/31  -> not valid
                2012/06/31  -> not valid
                1994/09/31  -> not valid
                2012/11/31  -> not valid

Done

Well, another successful journey into regular expressions. I am quite enjoying this. So, expect there will be a few more such posts.

Until next time, try some regex crosswords. Rather different approach to playing with regexes. Enjoy!