Speed Counts — At Least the Perceived Speed

In the last post we wrote a simple module to create and search a database of country/region names taken from the world population data, CSV, file. Before I get carried away and try to find ways to make the search faster I’d like to get some idea of how fast or slow the search code really is.

Keep in mind, we are reading through the whole name database file for each and every search. For at least the name starts with mode we could in fact end reading the file once a name begins with something later alphabetically than the characters we are looking for. That, of course, isn’t possible with the anywhere mode. But, do we need to rework the code? Is there any real value?

And, I also considered building a version of the search that ran in memory. Trading memory space for the relative slowness of reading and processing a file. The file is afterall only 11k, not a lot of memory space on a modern computer. (My yet to be set up ’new’ development system has 64 GB of ram.) Is there a benefit?

First Attempt — `time` module

A basic — simple — approach would be to record the current time, run a search multiple times, record the new time and do some arithmetic to get the average time to complete a search. So, let’s give that a try.

I am going put the timing code in the module file. But will likely never commit it to the git repository as it is, for the moment at least, just play around code. Also I am going to modify the other test code (add a variable and enclose the code in an if block that tests that variable’s value). Something like:

if __name__ == "__main__":
  do_tests = False

  if pathlib.Path(LST_NM).exists():
    print(f"'{LST_NM}'' already exists. So, not generating country/region name list file.\n")
  else:
    print(f"'{LST_NM}'' does not exist. Generating country/region name list file.\n")
    gen_file_list()

  if do_tests:
    f_str = 'Ca'
    names = find_names(f_str)
    print(f"\nsearching for '{f_str}', found:\n{names}")
    f_str = 'States'
    names = find_names(f_str, anywhere=True)
    print(f"\nsearching for '{f_str}', found:\n{names}")

Ok, I’ve decided to allow myself the option of keeping the timing code. So, new variable and new if block. We need the time module, so I will import that in the if block. To start, I’ll just run the search once to see what kind of time I get. We also have, as of Python 3.7, two choices time.time() and time.time_ns(). I am going to try both. Realizing there will be discrepancy because of the sequential execution of the lines of relevant code. And, because of possible resolution differences between the two functions. The execution time will be shown in milliseconds.

  do_timing = True
  if do_timing:
    import time
    s_tm = time.time()
    s_ns = time.time_ns()
    names = find_names("Ca")
    e_tm = time.time()
    e_ns = time.time_ns()
    print(f"\nOne execution of find_names('Ca'):")
    t_diff = e_tm - s_tm
    print(f"\ttime diff: {e_tm} - {s_tm} = {t_diff / nbr_reps * 1000:.5f} msec")
    ns_diff = e_ns - s_ns
    print(f"\ttime_ns diff: {e_ns} - {s_ns} = {ns_diff / nbr_reps * (10**-9) * 1000:.5f} msec")

Running the module a few times I got the following outputs (extraneous lines removed). Not really sure how we got an excution time of 0 seconds.

One execution of find_names('Ca'):
        time diff: 1592688974.392077 - 1592688974.3910766 = 1.00040 msec
        time_ns diff: 1592688974392077100 - 1592688974391076700 = 1.00040 msec
        
One execution of find_names('Ca'):
        time diff: 1592688986.7977774 - 1592688986.7977774 = 0.00000 msec
        time_ns diff: 1592688986797777500 - 1592688986797777500 = 0.00000 msec

One execution of find_names('Ca'):
        time diff: 1592688984.414045 - 1592688984.4130464 = 0.99874 msec
        time_ns diff: 1592688984414045300 - 1592688984413046300 = 0.99900 msec

Now, I am going to run the search code 10 times within a for loop. I will also modify the output statement to show the number of times the search was executed, and the average execution time in milliseconds.

  for _ in range(10):
    names = find_names("Ca")

And, running the module twice generates:

10 executions of find_names('Ca'):
        time diff: 1592688869.2907443 - 1592688869.2857454 = 0.0049989, average = 0.49989 msec
        time_ns diff: 1592688869290744300 - 1592688869285745500 = 4998800, average = 0.49988 msec

10 executions of find_names('Ca'):
        time diff: 1592688879.1314359 - 1592688879.1274343 = 0.0040016, average = 0.40016 msec
        time_ns diff: 1592688879131435700 - 1592688879127434300 = 4001400, average = 0.40014 msec

10 in a row averages faster than 1 at a time? I am going to guess that the operating system is caching file blocks in memory. Because of the quicker execution between file reads, more of those blocks remain in memory, speeding up ‘reading’ the file after the first execution of the search function. Something to keep in mind when testing file access operations.

And, running the loop 100 times, we get a similar average excution time for the search.

100 executions of find_names('Ca'):
        time diff: 1592688914.7256465 - 1592688914.6696434 = 0.0560031, average = 0.56003 msec
        time_ns diff: 1592688914725646600 - 1592688914669643600 = 56003000, average = 0.56003 msec

So, what we are talking about is roughly a .5 msec execution time for any search. And, if I run the search using anywhere mode, the average execution time is very similar. From my perspective, that execution time is more than acceptable. So, for now, no code enhancement so try and speed things up. Your mileage may vary.

That’s it for this one. Oh yes, I did commit my changes to the local and remote repositories along the way. I failed to mention that above. And, my final code for the timing section looks like this:

  do_timing = True
  if do_timing:
    import time
    s_tm = time.time()
    s_ns = time.time_ns()
    nbr_reps = 100
    for _ in range(nbr_reps):
      names = find_names("Ca", anywhere=True)
    e_tm = time.time()
    e_ns = time.time_ns()
    print(f"\n{nbr_reps} execution{'s' if nbr_reps>1 else ''} of find_names('Ca'):")
    t_diff = e_tm - s_tm
    print(f"\ttime diff: {e_tm} - {s_tm} = {t_diff:.7f}, average = {t_diff / nbr_reps * 1000:.5f} msec")
    ns_diff = e_ns - s_ns
    print(f"\ttime_ns diff: {e_ns} - {s_ns} = {ns_diff}, average = {ns_diff / nbr_reps * (10**-9) * 1000:.5f} msec")

Resources

Python documentation on:

time — Time access and conversions

Python also provides a built-in timing module. If you decide to try it be careful, there are some potential gotchas. E.G. the number parameter. I did use it in my initial testing, but figured the approach using the time module was good enough.

timeit — Measure execution time of small code snippets

Non-Python Languages

I recently took some time off this project to work on a personal web based project. I use Win 10's workspaces to organize my projects. So, a workspace for my daily routine. One for my sport activities. Another for home and garden related stuff. One for the blogging project. One for data science stuff. Etc.

As a result I was keeping something like half a dozen browser windows open in each workspace. And each window would have a dozen or more tabs open. A real draw on the pc's resources. And a bit of a nightmare following a reboot. Had to go through and re-assign browser/application windows to the appropriate workspace each time. Nothing regarding workspaces saved by the system or applications. I started creating individual web pages listing tabs by subject for each of my workspaces and the windows open in each. Bit of a nightmare.

So I wrote a Perl script to pull the info out of the web pages and put it into CSV files. Then I wrote a PHP web page to open specified file(s) and search for links related to specified subject matter. The file names and subjects were provided in the URL querystring. Not pretty but it basically works and I am now leaving most of the original browser windows closed. Can generate as needed.

But, the syntax requirements of PHP were driving me nuts. Brackets (), braces {}, dollar signs, line terminator (;), etc. everywhere. Perl not so much, but it still had enough of its own to occasionally cause me some grief. Made me realize how much 'friendlier' Python syntax is in comparison.

Too Old To Code

How Long Does Searching Names Take

Speed Counts — At Least the Perceived Speed

First Attempt — `time` module

Resources

Speed Counts — At Least the Perceived Speed

First Attempt — time module

Resources

First Attempt — `time` module