Ok, we wrote a simple Python test program earlier, but now is the time to look at something more complicated. And perhaps a bit more interesting. This might be more than is reasonable for a starting program. But, I don’t want to start by talking about basic programming stuff. You know, variables, syntax, functions, etc. Though there will of course be some of that below.
I do recommend everyone find themselves a good, basic, on-line Python tutorial. Or, as I prefer, a good book. I have become a fan of Manning Publications’ books and MEAPs. Though back when I was learning Perl and basic web related stuff, O’Reilly books were my go to. And, in between I also bought a number of Sitepoint books. Currently have a subscription to their on-line library. Though the latter is likely overkill for our purposes here.
Proposal
Keeping with the age related theme, I am going to suggest we write a small Python program to display a chart showing the world popultation by age group.
In the real world I expect there are APIs (application programming interfaces) available to get population data. I haven’t really looked, as I don’t want to deal with coding that kind of thing at this time. For this exercise I am just going to create a dictionary and manually enter the data obtained from an on-line source.
I obtained a rather large CSV (comma separated values) file from the United Nations World Population Prospects 2019 site. The one I selected was “Population by Age and Sex, Medium variant, annual, from 1950 to 2100” (CSV, 113.05 MB). Way more information than I currently wanted. But I managed to find a number of rows covering the world population by age group for 2020 (an estimate of course).
900,World,2,Medium,2020,2020.5,0-4,0,5,349432.556,328509.234,677941.79
900,World,2,Medium,2020,2020.5,5-9,5,5,342927.577,321511.867,664439.444
900,World,2,Medium,2020,2020.5,10-14,10,5,331497.486,309769.906,641267.392
900,World,2,Medium,2020,2020.5,15-19,15,5,316642.222,295553.758,612195.98
900,World,2,Medium,2020,2020.5,20-24,20,5,308286.775,289100.903,597387.678
900,World,2,Medium,2020,2020.5,25-29,25,5,306059.387,288632.766,594692.153
900,World,2,Medium,2020,2020.5,30-34,30,5,309236.984,296293.748,605530.732
900,World,2,Medium,2020,2020.5,35-39,35,5,276447.037,268371.754,544818.791
900,World,2,Medium,2020,2020.5,40-44,40,5,249389.688,244399.176,493788.864
900,World,2,Medium,2020,2020.5,45-49,45,5,241232.877,238133.282,479366.159
900,World,2,Medium,2020,2020.5,50-54,50,5,222609.691,223162.982,445772.673
900,World,2,Medium,2020,2020.5,55-59,55,5,192215.395,195633.743,387849.138
900,World,2,Medium,2020,2020.5,60-64,60,5,157180.267,164961.323,322141.59
900,World,2,Medium,2020,2020.5,65-69,65,5,128939.392,140704.32,269643.712
900,World,2,Medium,2020,2020.5,70-74,70,5,87185.982,101491.347,188677.329
900,World,2,Medium,2020,2020.5,75-79,75,5,54754.941,69026.831,123781.772
900,World,2,Medium,2020,2020.5,80-84,80,5,33648.953,48281.201,81930.154
900,World,2,Medium,2020,2020.5,85-89,85,5,15756.942,26429.329,42186.271
900,World,2,Medium,2020,2020.5,90-94,90,5,5327.866,11352.182,16680.048
900,World,2,Medium,2020,2020.5,95-99,95,5,1077.791,3055.845,4133.636
900,World,2,Medium,2020,2020.5,100+,100,-1,124.144,449.279,573.423
Each row contains the following information:
LocID,Location,VarID,Variant,Time,MidPeriod,AgeGrp,AgeGrpStart,AgeGrpSpan,PopMale,PopFemale,PopTotal
Now there are a couple of ways we could proceed. I could put the rows above into a separate file. Use Python to read the file line by line pulling out the information I want. Probably putting it into a dictionary
of some sort. Or we could just use the data to manually create the dictionary and go ahead with the code to display the chart(s). I took a vote, and for now at least, it was one-love in favour of the manual approach.
In a standard Python list the indices are integers reflecting the value’s position in the list. These integers are zero based. That is they start at 0
and end at length of the list minus 1
. So for a list with 5 items the indices would run from 0-4. 0 being the first item in the list and 4 being the last. This type of list maintains the order of its values. There is no such guarantee with a dictionary.
Now a dictionary
(we called it a hash
in my early days with Perl) is just a list of values indexed by some key. The key is used to access the value associated with it. Given a dictionary called population, we might get the population for the 60-64 age group using population['60-64']
. The index to a list in Python is enclosed in square brackets.
Ok, time to fire up VS Code (or the editor of your choice) and get started creating our dictionary. So start your editor and select the pyPlay workspace (or equivalent). For this exercise, we are going to just use a single file of code. Let’s called it population_by_age.py
. So create a new file with that name in the py_play folder. In the case of VS Code hover over the PYPLAY header and click on the +page icon. Enter the file name in the box and hit enter.
I am going to create the dictionary population
using the age group in each row in the data above as the key and the PopTotal as its value. In Python a dictionary is identified by being contained in curly braces. The key and value are separated by a colon and each key/value pair is separated by a comma.
# estimated world population in 2020 by age group
population = {
'0-4': 677941.79,
'5-9': 664439.444,
'10-14': 641267.392,
'15-19': 612195.98,
'20-24': 597387.678,
'25-29': 594692.153,
'30-34': 605530.732,
'35-39': 544818.791,
'40-44': 493788.864,
'45-49': 479366.159,
'50-54': 445772.673,
'55-59': 387849.138,
'60-64': 322141.59,
'65-69': 269643.712,
'70-74': 188677.329,
'75-79': 123781.772,
'80-84': 81930.154,
'85-89': 42186.271,
'90-94': 16680.048,
'95-99': 4133.636,
'100+': 573.423
}
File free to copy and paste the above into your Python code file. Or if you wish feel free to recreate it yourself.
Now Python has all sorts of ways to access the contents of lists, dictionaries, sets and the like. We can get all the values in the above dictionary using population.values()
. values()
is just a method (think function, operator or somesuch) of the dictionary object (in Python pretty much everything is an object). The values method returns an iterable providing all the values in the dictionary. An iterable is essentially just a list of items (numbers, strings, objects, etc) that we can walk over one item at a time doing whatever we need to do with each item before moving on to the next. Python also has a built-in function sum()
that takes an iterable as an argument and returns the sum of all the values in the iterable. So, just for fun, let’s do that. Add the following line to your file, and save it.
print(sum(population.values())
Once you’ve added the line and saved the file, right click in the editor window and select Run Python File in Terminal. VS Code will activate the conda environment and run the file. This is what I got in the terminal window:
R:\learn\py_play>E:/appDev/Miniconda3/Scripts/activate
(base) R:\learn\py_play>conda activate base-3.8
(base-3.8) R:\learn\py_play>E:/appDev/Miniconda3/envs/base-3.8/python.exe r:/learn/py_play/population_by_age.py
File "r:/learn/py_play/population_by_age.py", line 26
print sum(population.values())
^
SyntaxError: invalid syntax
(base-3.8) R:\learn\py_play>E:/appDev/Miniconda3/envs/base-3.8/python.exe r:/learn/py_play/population_by_age.py
7794798.729000001
You will note the error message. If you look closely you will see I had forgotten a set of brackets when I first entered the line of code. Functions, e.g. print, sum, require brackets to enclose the values we are passing them to operate on. These values are referred to as parameters. I had left off the brackets for the print function when I first typed the code into my program file. Once fixed we were good to go.
I expect the values in the UN file are in thousands, as the sum of the population values is just shy of 7.8 million. If we multiply that by 1000 we get something like 7.8 billion — which I believe is close to the world’s total population. Okay we can delete that last line. Or you can for now comment it out by typing a # at the front of the line.
If using VS Code, after saving your file, you will perhaps have noticed an icon on the left of the VS Code window with a blue (at least in my case) circle with a 1 in it. The icon looks like a tree structure. If you hover over the icon you will see something like Source Control (Ctrl+Shift+G) - 1 pending changes. As I said before VS Code apparently works pretty much hand in hand with Git. Not something I have any experience with at this point. But, we’ll get to that later.
I plan on starting out by trying to plot a bar chart of the population data in our dictionary. You may recall we added the package matplotlib in the post where we set up Miniconda. Now we get to use it. In order for us to be able to use matplotlib in our code we need to import the package, or at least the parts we want. Importing packages or some of their functionality is a routine part of Python programming. Matter of fact pretty routine in all commonly used programming languages. There are a number of ways to import packages (in whole or in part), but that discussion is for another day. For our purposes, we will add the following 2 lines of code to the top of our file.
import matplotlib.pyplot as plt
import numpy as np
Basically we are saying import all of the pyplot functions available in matplotlib and let us call them with the alias plt. And, import all of numpy with the alias np. You may recall that numpy was installed by conda as a dependency for matplotlib. We’re going to get numpy to help sort the locations of the labels on the x-axis. So here we go. Comments are used to describe what the lines of code are doing.
Now I know I said go ahead and copy/paste the data above. But for the code below I don’t recommend doing that. If you want to learn to program, doing is very important. So read the comments and code, try to sort what each line accomplishes, then go to the editor and type a similar line of code without looking at the posted code. Even think about changing variable names to make sure you have to type something different. This approach will pay off in the long run.
# define the x-labels for the chart
x_labels = population.keys()
# get the y-values for each x-label
x_values = population.values()
# figure out where to put each of the x-labels based on their size, nice of numpy to help
y_pos = np.arange(len(x_labels))
# because of the x-label sizes, we need a largish plot
plt.figure(figsize=(15,7.5))
# give matplotlib.pyplot the values it needs to sort the chart
plt.bar(y_pos, x_values, align='center', alpha=0.5)
# tell it what the x-labels are and where to put them
plt.xticks(y_pos, x_labels)
# add some info regarding the axes and give the chart a title.
plt.xlabel('Age Group')
plt.ylabel('Population (1000s)')
plt.title('2020 World Population by Age Group')
plt.show()
When you now run the program, you will see a window open up with the plot of the bar chart showing in it. The terminal window in VS Code will be locked until such time as you close the plot window.
I guess I should include a picture of my result. Will see if I can manage to accomplish that.
Commit Our Program
Last thing for this session, committing our code to version control. I prefer the command line (cmd prompt) for this, but it is also possible to do so within VS Code. Have never done that, so let’s give it a try.
Ok, with the pyPlay workspace open in VS Code, click on the Version Control icon, or press Ctrl+Shift+G. The latter didn’t work for me, some other application has grabbed that key sequence for its own use. Too lazy to sort it out.
I got a side bar with SOURCE CONTROL: GIT at the top (with some icons to the right). Below that is a text box with Message and some other info in it. Below that is a header labelled CHANGES, with a ‘down arrow’ to the left. If you don’t see the down arrow, but have a > instead, click the CHANGES header. Below that you should see your Python code file and a green ‘U’ at the right edge. The ‘U’ means the file is untracked. So let’s stage it (the equivalent of git add
). Hover over the file name and click the ‘+’ sign/icon. After a moment or two the display changes.
Nothing is now listed below the CHANGES header. But I now see a new header STAGED CHANGES. The code file is now listed below it with a green ‘A’ to the right. So we should now be ready to commit our program file to version control. I enter a message, ‘added population_by_age.py, code to plot histogram of population by age’, in the text box, clicked the checkmark above. After a bit the tab updates and we are left without any pending changes, or staged files.
Now let’s see if we can push the chage to GitHub. Click the ‘more’ icon (’…’). Then click ‘Push’. A working indicator flowed along for awhile, then I got a message box asking if I’d like VS Code to run git fetch
periodically. I said no. When I check the commits on GitHub, there it is. So, were done.
Note, I still like the command line. But this is definitely and option as I don’t have to leave VS Code to commit changes.
Now, things may not work for you for one reason or another. I leave it to you to search the web for an solution to your problem.
That’s it for this one. Until next time.
Further Resources
- World Population Prospects 2019
- Matplotlib Bar chart
- How do you change the size of figures drawn with matplotlib?
- Using Version Control in VS Code