Going to continue playing with regular expressions. This time tackling the Regex Tuesday Challenge - Week Three. With a thumbs-up to Callum Macrae for creating the site.
The third regex tuesday challenge is to match dates in YYYY/MM/DD HH:MM(:SS) format. YYYY should be a year between 1000 and 2012, and everything else should be a valid month, date, hour, minute and second. The seconds should be optional. Don’t worry about leap years, and assume that all months have 30 days.
Regex Tuesday Challenge - Week Three
Once I get this working, I will likely try to check for the correct number of days in given months. With the exception of February; which I will assume can have 29 days.
Exercise Requirements
Date/Time Elements
I will probably generate individual regexes for each of the items in the date and use an f-string to produce the final regex. So, I will start with test cases for the individual date/time elements. Then use the challenges test cases for the final complete regex. Lots of fiddly arrays and/or dictionaries. But such is the coder’s life.
Year
Okay, the year must be between 1000 and 2012. I am going to assume inclusively seeing as how one of the test cases that should pass has a year of 2012. This again is going to be a set of alternative patterns. 1000-1999 is pretty straightforward. 2000-2012 requires a little more effort.
(:?1\d{3}|200\d|201[012])
The code to test the above follows. But I am leaving out the code defining the test arrays.
import re
... ...
r_dt = r'^(:?1\d{3}|200\d|201[012])'
d_tst = {'tyr': ['Test Years', yr_ok, yr_no]}
c_tsts = ['tyr']
# add in the '/' separating the date values
rgx = re.compile(fr'{r_dt}\/')
for f_tst in c_tsts:
print(f"\n{d_tst[f_tst][0]}")
if d_tst[f_tst][1] is not None:
print("\tThe following should all pass:")
for tst in d_tst[f_tst][1]:
r_tst = rgx.match(tst)
print(f"\t\t{tst} -> {'valid' if r_tst else 'not valid'}")
if d_tst[f_tst][2] is not None:
print("\tThe following should all fail:")
for tst in d_tst[f_tst][2]:
r_tst = rgx.match(tst)
print(f"\t\t{tst} -> {'valid' if r_tst else 'not valid'}")
And the resulting test output:
(g4p-3.11) PS R:\learn\regex_ai\blog> python datetime.py
Test Years
The following should all pass:
1000/ -> valid
1012/ -> valid
1950/ -> valid
2000/ -> valid
2001/ -> valid
2012/ -> valid
The following should all fail:
893/ -> not valid
999/ -> not valid
2013/ -> not valid
Month
This is a fairly straightforward alternation pattern. I don’t think it needs any explanation
(:?0[1-9]|1[012])
I have changed the test data to include year and month in the desired format. Here’s the changes to the code and my test results. The regex appears to work as desired.
r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'
d_tst = {'tym': ['Test Year/Month', ym_ok, ym_no]}
c_tsts = ['tym']
rgx = re.compile(fr'{r_dt}\/{r_mn}\/')
(g4p-3.11) PS R:\learn\regex_ai\blog> python datetime.py
Test Year/Month
The following should all pass:
1000/12/ -> valid
1012/01/ -> valid
1950/11/ -> valid
2000/12/ -> valid
2001/06/ -> valid
2012/10/ -> valid
The following should all fail:
893/10/ -> not valid
999/03/ -> not valid
2013/11/ -> not valid
1950/00/ -> not valid
2012/13/ -> not valid
1950/22/ -> not valid
Dates
Okay, keeping it with the stated approach that every month has 30 days we again have a alternation pattern. Similar to both of the above.
(:?0[1-9]|[12]\d|30)
The altered code and tests follow. I am using the full test set from the exercise. And, for now ignoring the time section. So keep that in mind when looking at the test results.
r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'
r_dy = r'(:?0[1-9]|[12]\d|30)'
d_tst = {'tdt': ['Test Full Date', dtm_ok, dtm_no]}
c_tsts = ['tdt']
rgx = re.compile(fr'{r_dt}\/{r_mn}\/{r_dy} .*$')
(g4p-3.11) PS R:\learn\regex_ai\blog> python datetime.py
Test Full Date
The following should all pass:
2012/09/18 12:10 -> valid
2001/09/30 23:59:11 -> valid
1995/12/01 12:12:12 -> valid
1001/01/07 14:27 -> valid
2010/10/20 10:10 -> valid
2000/01/01 01:01:01 -> valid
2007/07/22 22:34:59 -> valid
2010/05/05 00:00:00 -> valid
The following should all fail:
2012/9/18 23:40 -> not valid
2013/09/09 09:09 -> not valid
2012/00/01 01:49:59 -> not valid
2012/13/25 22:17:00 -> not valid
1994/11/00 12:12 -> not valid
2012/12/4 12:12 -> not valid
2009/11/11 24:00:00 -> valid
2012/06/24 13:60 -> valid
2002/10/10 14:59:60 -> valid
a2011/11/11 11:11:11 -> not valid
2005/05/05 05:05:05d -> valid
Now to the time section.
Hour
Okay the hour pattern will be different from that for the minutes and seconds (both of which are the same). But nothing we haven’t seen before. As you can see in the tests above, the hour uses the 24 hour clock. So, it can range from 00
to 23
. Another alternation pattern.
(:?[01]\d|2[0-3])
r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'
r_dy = r'(:?0[1-9]|[12]\d|30)'
r_hr = r'(:?[01]\d|2[0-3])'
d_tst = {'tdt': ['Test Full Date and Hour', dtm_ok, dtm_no]}
c_tsts = ['tdt']
rgx = re.compile(fr'{r_dt}\/{r_mn}\/{r_dy} {r_hr}\:.*$')
Test Full Date and Hour
The following should all pass:
2012/09/18 12:10 -> valid
2001/09/30 23:59:11 -> valid
1995/12/01 12:12:12 -> valid
1001/01/07 14:27 -> valid
2010/10/20 10:10 -> valid
2000/01/01 01:01:01 -> valid
2007/07/22 22:34:59 -> valid
2010/05/05 00:00:00 -> valid
The following should all fail:
2012/9/18 23:40 -> not valid
2013/09/09 09:09 -> not valid
2012/00/01 01:49:59 -> not valid
2012/13/25 22:17:00 -> not valid
1994/11/00 12:12 -> not valid
2012/12/4 12:12 -> not valid
2009/11/11 24:00:00 -> not valid
2012/06/24 13:60 -> valid
2002/10/10 14:59:60 -> valid
a2011/11/11 11:11:11 -> not valid
2005/05/05 05:05:05d -> valid
Minutes and Seconds
Since these both in the range 0-59
, the same pattern will work for both. And nice and simple it is.
(:?[0-5]\d)
So, let’s test the full data and time. Remembering the seconds are optional.
r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'
r_dy = r'(:?0[1-9]|[12]\d|30)'
r_hr = r'(:?[01]\d|2[0-3])'
r_ms = r'(:?[0-5]\d)'
d_tst = {'tdt': ['Test Full Date and Time', dtm_ok, dtm_no]}
c_tsts = ['tdt']
rgx = re.compile(fr'{r_dt}\/{r_mn}\/{r_dy} {r_hr}\:{r_ms}(:?\:{r_ms})?$')
Test Full Date and Time
The following should all pass:
2012/09/18 12:10 -> valid
2001/09/30 23:59:11 -> valid
1995/12/01 12:12:12 -> valid
1001/01/07 14:27 -> valid
2010/10/20 10:10 -> valid
2000/01/01 01:01:01 -> valid
2007/07/22 22:34:59 -> valid
2010/05/05 00:00:00 -> valid
The following should all fail:
2012/9/18 23:40 -> not valid
2013/09/09 09:09 -> not valid
2012/00/01 01:49:59 -> not valid
2012/13/25 22:17:00 -> not valid
1994/11/00 12:12 -> not valid
2012/12/4 12:12 -> not valid
2009/11/11 24:00:00 -> not valid
2012/06/24 13:60 -> not valid
2002/10/10 14:59:60 -> not valid
a2011/11/11 11:11:11 -> not valid
2005/05/05 05:05:05d -> not valid
Completed Week 3 Exercise
And the full regex passes all the test cases on the exercise page. Date and time patterns on separate lines for viewability.
^(:?1\d{3}|200\d|201[012])\/(:?0[1-9]|1[012])\/(:?0[1-9]|[12]\d|30)
(:?[01]\d|2[0-3])\:(:?[0-5]\d)(:?\:(:?[0-5]\d))?$
Enhance Day Pattern
Okay, let’s try to sort checking for a valid day value. I am going to try and combine lookbehinds with appropriate day patterns for the affected months. Every month can have at least 29 days. I am not going to try and sort February and leap years. Even though it is possible, the regex would become very large given the range of years involved. Other than February the remaining months can all have 30 days. And some of them can have 31 days.
So, I am thinking, the first alernation is simply allowing for a day value of 01-29. Simple enough. Then, if the current month is not February (02), allow for 30 days. Then check for months that can have 31 days. In my development (not in the post), I will add each of those a step at a time. Testing as I go. A new set of tests for only the date. No need to play with the time pattern any longer.
So, the first pattern looks like the following; which we have pretty much seen before.
(:?0[1-9]|[12]\d)
Now for the 30 day case, we need to make sure the month is not February. So lookbehind making sure the 3 characters are not 02/
(:?(?<!02\/)30)
And finally all the 31 day months. Bit of pain that one.
(:?(?<=(?:01\|03|05|07|08|10|12)\/)31)
And the whole thing:
(:?(:?0[1-9]|[12]\d)|(:?(?<!02\/)30)|(:?(?<=(?:01\|03|05|07|08|10|12)\/)31))
And, my tests.
r_dt = r'^(:?1\d{3}|200\d|201[012])'
r_mn = r'(:?0[1-9]|1[012])'
r_dy = r'(:?(:?0[1-9]|[12]\d)|(:?(?<!02\/)30)|(:?(?<=(?:01|03|05|07|08|10|12)\/)31))'
d_tst = {'tdt': ['Test Days Correct for Month', dt_ok, dt_no]}
c_tsts = ['tdt']
rgx = re.compile(fr'{r_dt}\/{r_mn}\/{r_dy} $')
Test Days Correct for Month
The following should all pass:
2012/01/31 -> valid
2001/02/29 -> valid
1995/03/31 -> valid
1001/04/30 -> valid
2010/05/31 -> valid
2000/06/30 -> valid
2007/07/31 -> valid
2010/08/31 -> valid
2010/09/30 -> valid
2010/10/31 -> valid
2010/11/30 -> valid
2010/12/31 -> valid
The following should all fail:
1950/01/00 -> not valid
2012/01/32 -> not valid
2013/02/30 -> not valid
2012/04/31 -> not valid
2012/06/31 -> not valid
1994/09/31 -> not valid
2012/11/31 -> not valid
Done
Well, another successful journey into regular expressions. I am quite enjoying this. So, expect there will be a few more such posts.
Until next time, try some regex crosswords. Rather different approach to playing with regexes. Enjoy!