Programming/Coding Is an Iterative Process

Well, we wrote a program and it pretty much worked. But, what if we wanted to produce a second bar chart for a different year? Or a chart for a specific country rather than the world? Or plot the population for a specific age group for the past decade?

In the real world, programming is a bit cyclical. You write a bit of code and make sure it works. (Don’t forget to commit to version control). But you haven’t actually finished the job. So, you make some additions, changes, fixes, or whatever. Make sure it works, commit, etc. Then start the whole process over again for the next bit — whatever that might be.

There are also a few principles that all good code implements, or so I have read. One is the separation of concerns. Another is that code should be DRY. Separation of concerns basically means that each piece of code should do one thing and do it well. And, it shouldn’t care how other bits of code are doing their thing. And obviously DRY has nothing to do with moisture. It stands for Don’t Repeat Yourself.

Returning to our questions above, what if did want to plot the data for another year. For now we will ignore how we might get that data and just assume it is available on demand. Well we could just copy the code used to make the first plot down below, and modify it to use a different year’s data. And, if we wanted to plot more years, we could just keep repeating the copy and change process. But, that hardly seems efficient. And, I always figured programming was about making things somehow easier.

Not only that, what if we needed to change the package we are using to plot the charts. We’d have to change every single copy of the code we made. That’s why DRY is so important. So, what we want to do is package the bar chart code in some way and then use it for the data from different years without copying it over and over. Well that’s what functions are for. We’ve already seen functions. E.G. print(), sum(), keys(), etc. So, we could turn our bar chart code into a function. And call it repeatedly just giving it a different year’s data each time.

And to make things better, functions provide a useful way to accomplish the separation of interests. Each function should only do one thing and do it well. So, we don’t want to define/write a function that can plot anything in anyway. We want to define a function that generates one kind of chart and one kind only. With a goodly amount of choices, but not too many. If we do this correctly, no one need be concerned how it works, only that it does and how to make it do so (i.e. the required and optional parameters). Then if we for example are asked to change the package producing the chart, we can do so, without anyone who uses our function knowing or caring that we did so. Bonus!

Let’s give that a try and create a function definition. Note, since we are using version control, I am not going to create a new file. I will just change the one I have. There are some choices to be made before we start.

For the function to work properly it has to be given the information necessary to plot a specific year’s data. Functions use the parameters included in the brackets following the function name for this purpose. There are a couple of ways to specify parameters in the function definition. Positional parameters are identified by there position within the brackets. Or we can use keyword passing, where we specify an appropriate name for each value we are passing to the function. Python is happy with either of these possibilities, and you can even use both in a single function definition. And, too make things even more entertaining, we can tell a function to accept an indefinite number of parameters, either positional or keyword. All these variations have their use cases.

So, what parameters do we think we need? Well, the year of interest might be one — so we can include the correct year in the chart’s title. And, we obviously need the data — labels and values. But do we need to pass the last 2 separately or can we just pass in the dictionary containing the data? As with all programming there are always options and not necessarily a best choice. A lot depends on what we want the program and its code to accomplish.

For now (remember programming is an interative process) we will pass in the year of the data (as a string) and the data as a single dictionary positionally. In Python we use the def keyword to define a function. def is followed by the function name and a set of parentheses. In the parentheses we define what parameters we expect. A colon is used to end that line and to indicate to Python that an indented code block should follow.

Python uses indentation to identify blocks of related code. Other languages require the use of various types of parentheses to define blocks. Javascript uses curly braces. Many require a line-terminator symbol as well. For example PHP requires all lines to end with a semi-colon. Python figures the end of a line is the end of a line. No terminator required. (Well there is the option to use one in certain cases.)

You may recall that in general VS Code defaulted to a tab of 4 spaces for its indentation of Python code. I think that is too much (this is the subject of much debate and religious belief), so I modified VS Code to only use 2 spaces per tab. I also set VS Code to convert all tab keypresses to spaces in the file.

So, our defintion will look something like this:

def draw_bar_chart(year, pop_data):
  <function code goes here>

We are going to play with things a bit before we get down to using the function to plot something, so bear with me.

If you haven’t already done so, open the pyPlay workspace in VS Code. Add the def line just above the code we used to generate the plot of the bar chart. Remember, no cut and paste ૼ type it yourself. Now, highlight all the plotting code and press the Tab key. VS Code should have indented all the code by the amount to which your configuration defaults. Save the file, and check to see if VS Code finds any problems. If it does, the file name in the side bar will be in red (or somesuch) and a number will appear to the right indicating how many errors it found.

If you don’t see the Output Window (in my case under the Editors window, with a header listing Problems Ouput Debug Console Terminal). If VS Code fround problems their number will be to the right of Problems. If you don’t see the Output Window, click on the ‘circled X’ icon in the Status Bar (coloured bar at the bottom of VS Code window). You should see a list of the problems VS Code found. You will need to deal with them before you proceed.

If no problems found, right click the file in the Editors window and select Run Python File in Terminal. If it hasn’t already done so, VS Code will activate the conda environment, and run the Python file. And we should see absolutely nothing, as we have never told Python to run the function.

Let’s try calling the function without any parameters. Add a blank line after the function defintion, then add the line plot_bar_chart() (no indentation). Save the file and you will see VS Code has identified 2 problems. And there should be a coloured, squiggly line under the function call. If you check the Problems tab VS Code is telling you the function call is missing values for the arguments ‘year’ and ‘pop_data’ (hence 2 problems).

Oh yes, parameters versus arguments. Search the web. I will undoubtably use the wrong word frequently.

Ok, let’s give it some arguments: plot_bar_chart('2010', population). Save and run the file again. This time you should see a bar chart. But the title says it is for the year ‘2020’, not ‘2010’. And, it shows us the data from our 2020 population dictionary. Progress.

Now let’s get the year displayed the way we want. Change the following line:

  plt.title('2020 World Population by Age Group')

to read

  plt.title(year + ' World Population by Age Group')

We are passing the year as a string (i.e. textual information). We want to add the year to the rest of the title (having removed the earlier hard-coded year 2020). String concatenation in Python is accomplished using a + sign.

Save and run. Now the chart title should show the year 2010 even though the data is that for 2020.

One more problem, the function is not actually using the data we included in the function arguments. It is using the hard-coded population dictionary data. See the lines:

  x_labels = population.keys()
  x_values = population.values()

We want to change these to use whatever we are passing in for the pop_data argument. So, let’s change all occurences of population in the function definition to pop_data. And to test that we are getting the data we are passing in let’s create a new population data dictionary. We will use nonsense values to ensure we are getting the correct data set. In fact we don’t even need to use as many age groups if we did everything right. You can copy/paste my version, or make your own.

test_population = {
  '0-4': 5,
  '5-9': 10,
  '10-14': 15,
  '15-19': 20,
  '20-24': 5,
  '25-29': 10,
  '30-34': 15,
  '35-39': 20,
  '40-99': 33,
  '100+': 41
}

Okay lets change the function call to use a bogus year, and the new data set. plot_bar_chart('2121', test_population). Save and run. Oh, yes, make sure you closed the previous plot window or nothing will happen until you do. There you go, we can now plot the data for any year we wish.

Finally, commit the revised file into version control and push the change to GitHub.

r:\learn\py_play>git status
r:\learn\py_play>git add population_by_age.py
r:\learn\py_play>git commit -m "move bar chart plotting code into function: plot_bar_chart()"
r:\learn\py_play>git push

You, of course, could choose to do so within VS Code.

And, that I think is a good post’s work. Until next time.

Further Resources