We’re now going to look at some of what NumPy provides for computation with arrays of data. The first of these will be the so called universal functions (ufuncs). Then we may look at the statistical summary and other aggregation methods if time and space allow.

Why Do We Need Universal Functions?

A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion, supporting array broadcasting, type casting, and several other standard features. That is, a ufunc is a “vectorized” wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs.

In NumPy, universal functions are instances of the numpy.ufunc class. Many of the built-in functions are implemented in compiled C code. The basic ufuncs operate on scalars, but there is also a generalized kind for which the basic elements are sub-arrays (vectors, matrices, etc.), and broadcasting is done over other dimensions. One can also produce custom ufunc instances using the frompyfunc factory function.

— Universal functions (ufunc)

The fact that these ufuncs perform all the basic stuff needed to do arithmetic on arrays is important. But, there is also the matter of speed. In a standard Python program we’d likely use a loop to do something to every element of an array. (Though there are some list methods as well.) The problem with this is that for all practical purposes Python loops are slow. Not necessarily because of the calculations involved, but because Python has to do type checking and, perhaps, function calls on each iteration of the loop. All time consuming activities, especially when repeated over a 1000s of loops.

Let’s for a moment pretend we have an image in a two-dimensional array, and we want to lighten the image. We will simulate this by generating a new array with each value being some percentage of the original array’s values (we will assume smaller numbers mean lighter colours). And, then we will time this using a large array.

In [1]:

import numpy as np
# let's make sure we can repeat the results each time we run the notebook
np.random.seed(123)

In [2]:

def image_lighten(img, pcent):
    """Lighten img by pcent %."""
    # quickly generate output array
    pct = ((100 - pcent) / 100)
    x, y = img.shape
    output = np.empty((x, y), dtype='int32')
    for i in range(x):
        for j in range(y):
            output[i, j] = (img[i, j] * pct)
    return output
img_1 = np.random.randint(0, 256, (10, 10))
img_ltr = image_lighten(img_1, 20)
print(img_1[0, :])
print(img_ltr[0, :])

[254 109 126  66 220  98 230  17  83 106]
[203  87 100  52 176  78 184  13  66  84]

That appears to work, let's time it using a large array.

In [3]:

img_lge = np.random.randint(0, 256, (1000, 1000))
%timeit image_lighten(img_lge, 20)

6.85 s ± 136 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I was getting tired waiting for that to finish. So don’t you be too jumpy.

Glad %timeit only used 7 runs. I’ve found the default in Python to be 1000 or more. That could have taken most of the day.

Let’s try using a NumPy vectorized operation.

In [4]:

pct = (100 - 20) / 100
%timeit (img_lge * pct)

4.97 ms ± 490 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]:

# quick check that we are calculating the same thing more or less
print(img_1[0, :] * pct)

[203.2  87.2 100.8  52.8 176.   78.4 184.   13.6  66.4  84.8]

Just in case you missed it, the vector opertation was hellishly faster. And, it given the difference in data type, did get the same answer for the first row of the image.

Basic Arithmetic

Now let’s look at some of these ufuncs in operation. Beginning with simple arithmetic

As one might expect, NumPy understands Python’s standard operators: addition, subraction, multiplication, negation, etc. But these operators (+, -, *, -, /, //, **, %) are in this case just wrappers/aliases for the underlying NumPy functions/methods.

Here’s a list/table showing (some of) the relationships.

Operator	Function	Description
-	np.negative	unary negation
+	np.add	addition
-	np.subtract	subtraction
*	np.multipy	multiplication
/	np.divide	division
//	np.floor_divide	floor/integer division
**	np.power	exponentiation
%	np.mod	modulus

And, some simple examples from the related notebook. Not sure how or how often these will be used, but something that we should be aware of at the very least.

In [6]:

# the basic artihmetic operators
v1 = np.arange(5)
v1 = np.array([0, 2, 5, 8, 9])
print(f"{'v1':<7} = {v1}")
print(f"{'-v1':<7} = {-v1}")
print(f"{'v1 + 2':<7} = {v1 + 2}")
print(f"{'v1 - 2':<7} = {v1 - 2}")
print(f"{'v1 * 2':<7} = {v1 * 2}")
print(f"{'v1 / 2':<7} = {v1 / 2}")
print(f"{'v1 // 2':<7} = {v1 // 2}")
print(f"{'v1**2':<7} = {v1**2}")
print(f"{'v1 % 3':<7} = {v1 % 3}")

v1      = [0 2 5 8 9]
-v1     = [ 0 -2 -5 -8 -9]
v1 + 2  = [ 2  4  7 10 11]
v1 - 2  = [-2  0  3  6  7]
v1 * 2  = [ 0  4 10 16 18]
v1 / 2  = [0.  1.  2.5 4.  4.5]
v1 // 2 = [0 1 2 4 4]
v1**2   = [ 0  4 25 64 81]
v1 % 3  = [0 2 2 2 0]

Slightly Less Simple Arithmetic

Let’s have a look at a few trigonometic and exponential functions. Which apparently will be of value as we progress with data science. Do note, the trig functions, when taking an angle as a parameter, use radians. You may recall that there are 2π radians (360°) in a circle.

I am not going to discuss any of the functions. If necessary will do so when the time comes. Just a quick example of the ones most of us have heard of. There are numerous others.

Also, keep in mind that the values are computed within machine precision. Something one should always keep in mind when doing arithmetic or mathematics in code.

In [7]:

# logarithms and such
v2 = np.array([2, 5, 8, 9])
print(f"{'v2 =':<11} {v2}")
print(f"{'e**v2 =':<11} {np.exp(v2)}")
print(f"{'2**v2 =':<11} {np.exp2(v2)}")
print(f"{'3**v2 =':<11} {np.power(3, v2)}")
print()
print(f"{'ln(v2) ='} {np.log(v2)}")
print(f"{'log2(v2) ='} {np.log2(v2)}")
print(f"{'log10(v2) ='} {np.log10(v2)}")

v2 =        [2 5 8 9]
e**v2 =     [7.38905610e+00 1.48413159e+02 2.98095799e+03 8.10308393e+03]
2**v2 =     [  4.  32. 256. 512.]
3**v2 =     [    9   243  6561 19683]
ln(v2) = [0.69314718 1.60943791 2.07944154 2.19722458]
log2(v2) = [1.         2.32192809 3.         3.169925  ]
log10(v2) = [0.30103    0.69897    0.90308999 0.95424251]

Reduction

NumPy provides a couple of functions that allow us to implement folding/reduction on our arrays. The first is similar to Python’s reduce() and is in fact called reduce(). The other, accumulate() essentially illustrates what the former did. Quick look before moving on.

In [8]:

# folding or reduction
v3 = np.array([2**x for x in range(0,6)])
print(f"{'v3 =':<19} {v3}")
# .reduce()
print(f"{'add reduce =':<19} {np.add.reduce(v3)}")
print(f"{'mult reduce =':<19} {np.multiply.reduce(v3)}")
# .accumulate
print(f"{'add acc =':<19} {np.add.accumulate(v3)}")
print(f"{'mult acc =':<19} {np.multiply.accumulate(v3)}")

v3 =          [ 1  2  4  8   16   32]
add reduce =  63
mult reduce = 32768
add acc =     [ 1  3  7  15  31   63]
mult acc =    [ 1  2  8  64  1024 32768]

Much, Much More

The are numerous other NumPy ufuncs available. E.G. bitwise operations, comparison operators, conversion of radians to degrees, etc. Check the NumPy documentation, plenty of interest for everyone.

There’s even a NumPy convenience class, np.vectorize to convert a user defined function into a vectorized version. Though, not a true NumPy vectorized function. It basically just implements for loops. So not performance oriented.

That’s It

Was planning to cover more, but really need to get some other things done. Feel free to download my notebook covering the above and play around. Or try out some of the other ufuncs. And I do apologize for not documenting the notebooks as well as I should. Didn’t feel like duplicating the posts in the notebooks.

Too Old To Code

Data Science Basics: NumPy — Universal Functions