In closing the last post I wrote:

Until next time, try some regex crosswords. Rather different approach to playing with regexes. Enjoy!

Well, I gave them a try. And, I found them interesting. So, I am going to solve a few, with as good an explanation as I can provide. I will likely start with a couple beginner ones, then follow that with some intermediate ones. Which will likely be enough for a single post.

I am probably going to have some issues getting the puzzle to display in some reasonable way. I didn’t really like the vertical text used on the puzzle site. The site also used capitals, I am going to use lowercase, unless the puzzle is case sensitive.

Unlike the previous regex posts I will try to explain some of the basics for patterns used in the puzzles.

Note: appears the site design has changed since I worked on this and the next post. Not sure I like the change. Your mileage may vary. And, I have had to change the links I had in the draft post. Hopefully the new ones will continue to work.

Beginner #1

epipef
[^speak]+
hello+ 
[please]+  

Basics

Okay, some basics. Any literal character means that character goes in the specified location. A . (dot) means any character without restraint. But there are some metacharacters that can affect how any character, or group of characters are treated. Groups are created by brackets: () or []. The three most basic quantifiers are:

  • *: repeat the preceding character or group zero or more times
  • +: repeat the preceding character or group one or more times
  • ?: repeat the preceding character or group zero or one time

In this puzzle we also see the use of character classes and alternation.

  • [...]: use one of the characters in the group at the specified spot, can be modified by the metacharacters above
  • [^...]: do not use one of the characters in the group at the specified spot, can be modified by the metacharacters above
  • ....: use either the pattern on the left or the pattern on the right, can be extended to more optional clauses (as is the case in the puzzle)

In a character class, an unescaped hyphen represents a range. So, [0-9] means all the digits from 0 to 9 inclusive. Do note that only a single character can be used at the start or end of range. So, [0-12] will not mean all the digits between 0 and 12 inclusive. It in fact means the digits from 0 to 1 and/or the 2. This range specification can also be used for the letters of the alphabet. [a-z] being all the lowercase letters from a to z.

Should you wish to search for an actual hyphen you need to escape it. That is you must use \-. The \ is the escape character. It gives some other meaning to whatever follows it. You will see that in the second puzzle: \d.

We have literal characters, a quantified character and alternation in the pattern for the first row. So the first row must be either he or ll or oo. The latter because, we have two characters in the row, and o+ means one or more os. And, we need two characters to fill the row.

The second row pattern is a postiive character class with a + quantifier. So the second row must contain one of the specified characters, please, in each location due to the quantifier (one or more). The order of the characters within the brackets has no impact on which one can be used.

The pattern for the first column is sort of the reverse of that for the second row. It is a negative character class. Also quantified with a + (one or more). But in this case the pattern is telling us that none of the characters specified, speak, can be in the first column.

And for the second column we have an alternation of three sets of two characters. So, the third column will have to be ep, ip or ef.

Now let’s figure what goes where.

Solve the Crossword

So the first row is either going to be he, ll or oo. The negative character class for the first column doesn’t limit the choices we have for the first character of the row: h, l or o. However, the 2nd column, as discussed above, is going to be ep, ip or ef. That does affect the choice for the first row. None of the choices for the first row have an i for the second character. Which means that second character must be an e. So we know the first row must be he.

epipef
[^speak]+
hello+he
[please]+  

Now on to the second row. Each column in that row must contain one of the characters in the specified class, please. And, that negative character class for the first column says it can’t be any of speak. That eliminates everything but the l. And, we know that the second column must be ep or ef. But there is no f in the character class for the second row, but there is a p. So there you go.

epipef
[^speak]+
hello+he
[please]+lp

Which, on the site, proved to be correct. And, given the name for the crossword, Beatles, it does make sense. Quite enjoyed that!

Beginner #5

I wasn’t going to cover another beginner puzzle. But, learning is about doing and repetition. So one more before moving on to some harder crosswords. (Beginner #5)

569473
\d[2480]
18l920
[6789]\d  

Basics

This puzzle introduces a new pattern, \d. This is a shorthand character class. There are a number of them. Which ones are available can vary by the regular expression engine being used. Here’s a few of them.

  • \d == [0-9]
  • \w == [a-zA-z0-9], do note, a good number of regex engines support unicode, that may affect what \w will match
  • \s - all the whitespace characters == [ \t\r\n\f], i.e. space, tab, carriage return, linefeed, formfeed

The uppercase versions of the above, e.g. \D, represent the negative character class. I.E. \D means [^0-9] or none of the digits from 0 to 9 can be at the specified location in the string or text.

Solve the Puzzle

Looking at the first row, it can be 18, 19 or 20. The first columns pattern doesn’t limit the choices; \d says any digit in that spot. But if we look at the second column’s pattern the first character of that column must be 5, 9 or 7. Which means the first row must be 19.

That means the second column has to be 94.

Looking at the first column of the second row, it has to be a character contained in the intersection of the two character classes, [6789] and [2480]. That gives us an 8.

569473
\d[2480]
18l92019
[6789]\d84

And, that was correct when entered on the crossword site. Note, the title for the crossword was “Airstrip One”.

Time to look at something a touch harder.

Intermediate #3

Let’s look at the intermediate crossword #3.

[ram].[oh]
.*[way]+
.(.)\1
catforfat   
ryty\-   
[towel]+   

Basics

This puzzle introduces us to a capture group. That would be in the pattern for the first column. When a capture group is specified, using parentheses, (), the regex engine keeps track of whatever the pattern within the group found (which could be nothing). This allows for it to be reused later in the regex. Each capture group is assigned a number from 1-9 in the order they are captured. The exact match for that group can be used later in the regular expression by escaping the capture groups numeric label. E.G. \1. As seen in the first column’s pattern.

That pattern is essentially saying the second and third characters of the column must be the same. Without in anyway saying what that character must be.

Solve the Puzzle

Okay, first row will be one of the three words in the alternation pattern. The first two column patterns don’t help much as they both allow any character (.) in their first position. However the third columns requires that it’s first character be one of [ram]. The only choice for the first row that matches that requirement is for.

For the second row we know that the last position has to be the - (hyphen), as it is specified as an escaped literal character. And the third column’s pattern does not object, as the regex pattern is . (i.e. any character). Now for the first two characters (either ry or ty). The first column allows any character in the row’s first position. But, it also requires that the last position in the column be the same character as the middle position. And if we look at the pattern for the third row it only allows the t not the r in the first column of the third row. So our second row has to be ty-. The second column and row patterns only allows a y in the middle position (row and column).

We already know that the first character in the third row has to be a t (due to the \1). The remaining two letters are easily determined by the intersection of the character classes, row and column, for those positions. So the third row is two.

[ram].[oh]
.*[way]+
.(.)\1
catforfatfor 
ryty\-ty- 
[towel]+two 

That also was marked correct on the crossword web site. (Puzzle title: “Earth”)

Intermediate #5

Puzzle link.

(na)*
(foar)*
(dfuuf)+
[^nru](noon)
[runt]*    
o.*[hat]    
(.)*do\1    

Basics

In this puzzle we see the use of parentheses () to group alternations so that they can have a modifier applied.

So the pattern for the last column says the column will contain a or n zero or more times.

Whereas that for the second column says the column will contain d, fu or uf one or more times. For both regexes, order is not being specified with respect to the choices. That said we pretty much know that the d will be used. Only 3 characters in the column. So neither fu or uf can be used more than once.

We also see the use of literal characters in the second and third row patterns. As well as a capturing group in the third row’s pattern.

Solve the Puzzle

Ok, let’s start with the ones we know for sure. Second row, first column is an o. Third row, columns 2 and 3 are do. I am sure that will be of some help.

(na)*
(foar)*
(dfuuf)+
[^nru](noon)
[runt]*    
o.*[hat]o    
(.)*do\1do  

Ok, back to the first row. First character: row pattern says it must be one of [runt]* and column pattern for the first character says it can’t be any of [^nru] (negative character class). That leaves us with the t. Now the pattern for the second column gives a few choices, but only one begins with a character that is in the row’s character class. I.E. u. So the first two character of the second column must be uf. Comparing the row’s pattern with that of the third column says that the row’s third character must be r. And doing the same for the fourth character says it must be n. So the first row is turn. And, we know the first two characters of the second column, uf.

(na)*
(foar)*
(dfuuf)+
[^nru](noon)
[runt]*turn 
o.*[hat]of   
(.)*do\1do  

Now for that third column, we have an o in the final position (row). Looking at the column’s pattern, that means the middle character in the column has to be f. And, the row’s regex allows that with the .* in its second position. I.E. we need to match the fo alternative; no other way to get an o in the last position. And, once again taking the intersection of the column and row patterns for the last character of the second row tells us it must be a. So the second row is offa.

(na)*
(foar)*
(dfuuf)+
[^nru](noon)
[runt]*turn 
o.*[hat]offa 
(.)*do\1do  

From what we have so far you can probably guess the rest. But…

Looking at the pattern for first column, given the o in the second position tells us the last character has to be n. Since we need to chose between no or on. And the pattern for the last row tells us the the first and last characters must be the same. (.) is a group that selects the characters matching its pattern. In this case any character (which we now know is n). \1 says the value of the first select group goes here. I.E. another n. That makes the last row ndon.

(na)*
(foar)*
(dfuuf)+
[^nru](noon)
[runt]*turn 
o.*[hat]offa 
(.)*do\1ndon 

And turn off and on seems appropriate for the puzzle title of Technology.

Done

You know I think that’s enough for this post. Some basic regular expression coverage plus four simple crosswords solved seems to make for a fair bit of information (and typing).

Have been using tables (with a lot of inline styling) for the crosswords, but think I should really look into using CSS grid. So I may just try to sort that out before starting the next post. Which will cover more puzzles, as I am enjoying solving the puzzles and documenting the solutions.

Until next time, keep those fingers happily drumming on the keyboard.