Dictionaries

In the last post, I suggested that Python’s dictionaries are an important data tool to keep in mind in your programming challenges. They are extremely versatile. Often in ways we may not initially consider.

In that post I had a quick look at an example supporting my position. Then discussed comprehensions, with perhaps too much of a focus on list comprehensions. Finally we had a quick look at how dictionaries are or are not ordered and a couple of ways to iterate over dictionaries. I left a lot out, but the resources in that last post should have given you a much better idea of how to create and use dictionaries.

I had planned to look at sorting dictionaries in that last post, but didn’t quite get there. So, here’s a short post covering sorting of dictionaries, Python dict object.

Sorting Dictionaries

I don’t off hand recall anything I worked on lately that required me to sort dictionaries. So, this section might get a little contrived. If so, my apologies. I did use a variety of dictionaries in the population related files. But, don’t recall needing to sort any of them. Nor do I think it would have made sense to do so given their structure.

So, let’s make ourselves a wee dictionary of people related data. Something we might have obtained from a database. Though likely significantly larger than ours. The dictionary will be keyed on some id number or string (I am just going to use a number as a string, though it could just as easily be an e-mail address). Its value, for now, will be another dictionary containing the person’s firstname, surname, year joined, country code. Something like the following. Also, since we merged a few different groups awhile back, the id numbers have no relationship to any of person’s other data.

members = {
  '0795494': {'surname': 'Boyd', 'firstname': 'George', 'joined': 2009, 'country': 'IE'},
  '1828169': {'surname': 'Babe', 'firstname': 'Abe', 'joined': 2000, 'country': 'MX'},
  '0069132': {'surname': 'Karl', 'firstname': 'Carol', 'joined': 1995, 'country': 'GB'},
  '0389887': {'surname': 'Stick', 'firstname': 'Rachel', 'joined': 2003, 'country': 'ES'},
  '0002916': {'surname': 'Pine', 'firstname': 'Jack', 'joined': 2008, 'country': 'AT'},
  '1293705': {'surname': 'Bond', 'firstname': 'James', 'joined': 1994, 'country': 'GB'},
  '0000121': {'surname': 'Siegfreid', 'firstname': 'Barbara', 'joined': 2006, 'country': 'NZ'},
  '1306107': {'surname': 'Cooke', 'firstname': 'Alexander', 'joined': 2012, 'country': 'CA'},
  '0646055': {'surname': 'Błaszczyk', 'firstname': 'Pawel', 'joined': 2003, 'country': 'PL'},
  '0000243': {'surname': 'Włodarcyzk', 'firstname': 'Zosia', 'joined': 2014, 'country': 'PL'},
  '0000503': {'surname': 'Broman', 'firstname': 'Francis Oliver', 'joined': 2006, 'country': 'SE'},
  '0294758': {'surname': 'Racquets', 'firstname': 'Cathy', 'joined': 2004, 'country': 'CA'},
  '0008194': {'surname': 'Friendly', 'firstname': 'Fred', 'joined': 2015, 'country': 'DK'},
  '0300012': {'surname': 'Manly', 'firstname': 'Olivia', 'joined': 2014, 'country': 'CA'},
  '0008242': {'surname': 'Klock', 'firstname': 'Daniel', 'joined': 2013, 'country': 'DE'}}

Now let’s try using Python’s built-in sorted() function on the dictionary.

sorted_mbrs = sorted(members)
print('\n', sorted_mbrs)

And, I get the following:

(base-3.8) R:\learn\py_play>python.exe r:/learn/py_play/population/play/dicts_play.test.py
['0000121', '0000243', '0000503', '0002916', '0008194', '0008242', '0069132', '0294758', '0300012', '0389887', '0646055', '0795494', '1293705', '1306107', '1828169']

Don’t think that is particularly useful; but, let’s have a look.

for k in sorted_mbrs:
  print(f"{k}: {members[k]}")

And,

'0000121': {'surname': 'Siegfreid', 'firstname': 'Barbara', 'joined': 2006, 'country': 'NZ'}
'0000243': {'surname': 'Włodarcyzk', 'firstname': 'Zosia', 'joined': 2014, 'country': 'PL'}
'0000503': {'surname': 'Broman', 'firstname': 'Francis Oliver', 'joined': 2006, 'country': 'SE'}
'0002916': {'surname': 'Pine', 'firstname': 'Jack', 'joined': 2008, 'country': 'AT'}
'0008194': {'surname': 'Friendly', 'firstname': 'Fred', 'joined': 2015, 'country': 'DK'}
'0008242': {'surname': 'Klock', 'firstname': 'Daniel', 'joined': 2013, 'country': 'DE'}
'0069132': {'surname': 'Karl', 'firstname': 'Carol', 'joined': 1995, 'country': 'GB'}
'0294758': {'surname': 'Racquets', 'firstname': 'Cathy', 'joined': 2004, 'country': 'CA'}
'0300012': {'surname': 'Manly', 'firstname': 'Olivia', 'joined': 2014, 'country': 'CA'}
'0389887': {'surname': 'Stick', 'firstname': 'Rachel', 'joined': 2003, 'country': 'ES'}
'0646055': {'surname': 'Błaszczyk', 'firstname': 'Pawel', 'joined': 2003, 'country': 'PL'}
'0795494': {'surname': 'Boyd', 'firstname': 'George', 'joined': 2009, 'country': 'IE'}
'1293705': {'surname': 'Bond', 'firstname': 'James', 'joined': 1994, 'country': 'GB'}
'1306107': {'surname': 'Cooke', 'firstname': 'Alexander', 'joined': 2012, 'country': 'CA'}
'1828169': {'surname': 'Babe', 'firstname': 'Abe', 'joined': 2000, 'country': 'MX'}

And as expected not at all useful. It is important to remember that Python’s built-in sorted() function sorts the dictionary keys not the data. That is because the dict object by default returns its keys when iterated over. And yes the keys are sorted; but nothing much else looks to be sorted. As we mentioned the id numbers have no meaningful relationship with the member’s data.

So, now what? Well, let’s see if we can get the dictionary sorted by the year each member joined. This is going to involve using some of the sorted() functions optional parameters.

The one we are interested in is key=. It takes a “function (or other callable) to be called on each list element prior to making comparisons”. See Sorting HOW TO: Key Functions. As stated this named parameter takes a function as its value. This function, which is passed the item being sorted (for now we can assume that is one of the dictionary’s keys), should return the value to be used to determine the sort order. We want that to be the year the member “joined”. We’ll start by writing the function, then redo the sort passing that function to *sorted()". Sorted should return a list of the dictionary keys in the correctly sorted order.

def sort_by_year(p_id):
  return members[p_id]['joined']

mbrs_by_yr = sorted(members, key=sort_by_year)
print('\n', mbrs_by_yr)
for k in mbrs_by_yr:
  print(f"'{k}': {members[k]}")

Running my code produced the following.

(base-3.8) R:\learn\py_play>python.exe r:/learn/py_play/population/play/dicts_play.test.py
'1293705': {'surname': 'Bond', 'firstname': 'James', 'joined': 1994, 'country': 'GB'}
'0069132': {'surname': 'Karl', 'firstname': 'Carol', 'joined': 1995, 'country': 'GB'}
'1828169': {'surname': 'Babe', 'firstname': 'Abe', 'joined': 2000, 'country': 'MX'}
'0389887': {'surname': 'Stick', 'firstname': 'Rachel', 'joined': 2003, 'country': 'ES'}
'0646055': {'surname': 'Błaszczyk', 'firstname': 'Pawel', 'joined': 2003, 'country': 'PL'}
'0294758': {'surname': 'Racquets', 'firstname': 'Cathy', 'joined': 2004, 'country': 'CA'}
'0000121': {'surname': 'Siegfreid', 'firstname': 'Barbara', 'joined': 2006, 'country': 'NZ'}
'0000503': {'surname': 'Broman', 'firstname': 'Francis Oliver', 'joined': 2006, 'country': 'SE'}
'0002916': {'surname': 'Pine', 'firstname': 'Jack', 'joined': 2008, 'country': 'AT'}
'0795494': {'surname': 'Boyd', 'firstname': 'George', 'joined': 2009, 'country': 'IE'}
'1306107': {'surname': 'Cooke', 'firstname': 'Alexander', 'joined': 2012, 'country': 'CA'}
'0008242': {'surname': 'Klock', 'firstname': 'Daniel', 'joined': 2013, 'country': 'DE'}
'0000243': {'surname': 'Włodarcyzk', 'firstname': 'Zosia', 'joined': 2014, 'country': 'PL'}
'0300012': {'surname': 'Manly', 'firstname': 'Olivia', 'joined': 2014, 'country': 'CA'}
'0008194': {'surname': 'Friendly', 'firstname': 'Fred', 'joined': 2015, 'country': 'DK'}

And, that looks to have worked. Of course, at least one of you is saying, “…couldn’t you have used a lambda function?”. Some of you might be saying “…what’s a lambda function?”.

Lambdas are another one of those things with which we, as programmers, should get good and comfortable. I could have written the above function definition using a lambda expression in an assignment expression, sort_by_year = lambda key: members[key]['joined'] and proceeded accordingly.

Lambda expressions by default return the value of the expression following the colon. In this case key is the parameter which will be assigned a value by the sorted() function when each key of the dictionary is iterated over in the sorting process. And the lambda expression will return the year the member associated with that key joined our little club.

But, I could also just assign the lambda expression to the key= parameter. So I could ignore defining sort_byYear() and simply do the following:

mbrs_by_yr = sorted(members, key=lambda k: members[k]['joined'])
print('\n', mbrs_by_yr)
for k in mbrs_by_yr:
  print(f"'{k}': {members[k]}")

You can check for yourself the output from this modified code exactly matches that of the preceding code using the regular function definition approach. And, we can easily sort on any of the other member fields. E.G.

mbrs_by_ctry = sorted(members, key=lambda k: members[k]['country'])
# this time just print the first 5 sorted entries
for k in mbrs_by_yr[0:5]:
  print(f"'{k}': {members[k]}")

And, that appears to work as intended. And, I can confirm that it indeed does so.

'0002916': {'surname': 'Pine', 'firstname': 'Jack', 'joined': 2008, 'country': 'AT'}
'1306107': {'surname': 'Cooke', 'firstname': 'Alexander', 'joined': 2012, 'country': 'CA'}
'0294758': {'surname': 'Racquets', 'firstname': 'Cathy', 'joined': 2004, 'country': 'CA'}
'0300012': {'surname': 'Manly', 'firstname': 'Olivia', 'joined': 2014, 'country': 'CA'}
'0008242': {'surname': 'Klock', 'firstname': 'Daniel', 'joined': 2013, 'country': 'DE'}

Now let’s go a step further. What if we want the data sorted by the year joined and member name. Let’s try the following.

mbrs_yr_alpha = sorted(members, key=lambda k: f"{members[k]['joined']}{members[k]['surname']}")
for k in mbrs_yr_alpha[0:5]:
  print(f"'{k}': {members[k]}")

Amazing, seems to have worked the first time.

(base-3.8) R:\learn\py_play>python.exe r:/learn/py_play/population/play/dicts_play.test.py
'1293705': {'surname': 'Bond', 'firstname': 'James', 'joined': 1994, 'country': 'GB'}
'0069132': {'surname': 'Karl', 'firstname': 'Carol', 'joined': 1995, 'country': 'GB'}
'1828169': {'surname': 'Babe', 'firstname': 'Abe', 'joined': 2000, 'country': 'MX'}
'0646055': {'surname': 'Błaszczyk', 'firstname': 'Pawel', 'joined': 2003, 'country': 'PL'}
'0389887': {'surname': 'Stick', 'firstname': 'Rachel', 'joined': 2003, 'country': 'ES'}

But, that was fairly simple, given there were no duplicate surnames. So, let’s change “Stick” to “Błaszczyk”, and try that again. Ok, not quite. Rachel should likely come after Pawel.

(base-3.8) R:\learn\py_play>python.exe r:/learn/py_play/population/play/dicts_play.test.py
'1293705': {'surname': 'Bond', 'firstname': 'James', 'joined': 1994, 'country': 'GB'}
'0069132': {'surname': 'Karl', 'firstname': 'Carol', 'joined': 1995, 'country': 'GB'}
'1828169': {'surname': 'Babe', 'firstname': 'Abe', 'joined': 2000, 'country': 'MX'}
'0389887': {'surname': 'Błaszczyk', 'firstname': 'Rachel', 'joined': 2003, 'country': 'ES'}
'0646055': {'surname': 'Błaszczyk', 'firstname': 'Pawel', 'joined': 2003, 'country': 'PL'}

So, let’s try the following modification.

mbrs_yr_alpha = sorted(members, key=lambda k: f"{members[k]['joined']}{members[k]['surname']}{members[k]['firstname']}")

And, that seems to work.

(base-3.8) PS R:\learn\py_play> python.exe r:/learn/py_play/population/play/dicts_play.test.py
'1293705': {'surname': 'Bond', 'firstname': 'James', 'joined': 1994, 'country': 'GB'}
'0069132': {'surname': 'Karl', 'firstname': 'Carol', 'joined': 1995, 'country': 'GB'}
'1828169': {'surname': 'Babe', 'firstname': 'Abe', 'joined': 2000, 'country': 'MX'}
'0646055': {'surname': 'Błaszczyk', 'firstname': 'Pawel', 'joined': 2003, 'country': 'PL'}
'0389887': {'surname': 'Błaszczyk', 'firstname': 'Rachel', 'joined': 2003, 'country': 'ES'}

Comprehension

One last thing. I am curious if we can use a dictionary comprehension with the sorted() function to generate a new sorted dictionary. You know just in case we’d rather have the sorted dictionary to work with rather than just the sorted keys. May not be many use cases, but…

Seems to me should be easy enough. Not sure why one would want to do that. But in my mind never hurts to understand what things work or don’t. I am going to print the first 5 rows of the original dictionary and then the first 5 of the supposedly sorted dictionary for comparison. And, why not print out there addresses/ids just to make sure they are different objects.

print(f"id(members): {id(members)}")
for pid, mbr in list(members.items())[0:5]:
  print(f"\t'{pid}': {members[pid]}")
print()

mbrs_yr_alpha = {k: members[k]
  for k in sorted(members, key=lambda k: f"{members[k]['joined']}{members[k]['surname']}{members[k]['firstname']}")}
print(f"id(mbrs_yr_alpha): {id(mbrs_yr_alpha)}")
for pid, mbr in list(mbrs_yr_alpha.items())[0:5]:
  print(f"\t'{pid}': {mbrs_yr_alpha[pid]}")

And, I get the following which leads me to believe it worked just fine.

(base-3.8) R:\learn\py_play>E:/appDev/Miniconda3/envs/base-3.8/python.exe r:/learn/py_play/population/play/dicts_play.test.py
id(members): 1892771979584
        '0795494': {'surname': 'Boyd', 'firstname': 'George', 'joined': 2009, 'country': 'IE'}
        '1828169': {'surname': 'Babe', 'firstname': 'Abe', 'joined': 2000, 'country': 'MX'}
        '0069132': {'surname': 'Karl', 'firstname': 'Carol', 'joined': 1995, 'country': 'GB'}
        '0389887': {'surname': 'Błaszczyk', 'firstname': 'Rachel', 'joined': 2003, 'country': 'ES'}
        '0002916': {'surname': 'Pine', 'firstname': 'Jack', 'joined': 2008, 'country': 'AT'}

id(mbrs_yr_alpha): 1892771970304
        '1293705': {'surname': 'Bond', 'firstname': 'James', 'joined': 1994, 'country': 'GB'}
        '0069132': {'surname': 'Karl', 'firstname': 'Carol', 'joined': 1995, 'country': 'GB'}
        '1828169': {'surname': 'Babe', 'firstname': 'Abe', 'joined': 2000, 'country': 'MX'}
        '0646055': {'surname': 'Błaszczyk', 'firstname': 'Pawel', 'joined': 2003, 'country': 'PL'}
        '0389887': {'surname': 'Błaszczyk', 'firstname': 'Rachel', 'joined': 2003, 'country': 'ES'}

Another One Done

Unless something else comes to mind before I publish, I believe that is it for this one. Hope you found something new or of interest. Until next time, be happy and stay healthy.

Resources