Carrying on from where we left off last post, let’s start this one by looking at how we can access the elements of a NumPy array. And, maybe some of the attributes of NumPy arrays. Then we’ll look at array manipulation. We will probably in the next post finally get to some arithmetic using NumPy arrays.

Acessing NumPy Array Elements

Fortunately for us, accessing NumPy arrays has a lot of similarities with accessing Python list objects. That includes slices as well. Might be a bit of difference accessing multidimensional array elements.

Basic Array Indexing

Let’s start with the simplest, accessing an element in a vector (i.e. one dimensional array). Don’t forget to import NumPy. We’ll give ourselves a vector and an array to play with.

In [1]:

import numpy as np
v1 = np.random.randint(1, 7, 7)
a1 = np.random.randint(1, 10, (3, 3))
print(v1, "\n\n", a1)

<div class="prompt"></div>

[3 5 6 1 5 6 4] 
[[1 3 7]
[6 2 1]
[6 8 9]]

For vectors, like Python, square brackets and zero based indexing works as expected.

In [2]:


for i in range(0, 7, 3):
  print(f"[{i}]: {v1[i]}", end='; ')
# negative indexes work the same way as well
print("\n\n")
for i in range(1, 6, 2):
  print(f"[{-i}]: {v1[-i]}", end='; ')

[0]: 3; [3]: 1; [6]: 4; 
[-1]: 4; [-3]: 5; [-5]: 6;

For multidimensional arrays we need to use tuples of indices inside the square brackets.

In [3]:


for i in range(0, 3):
    print(f"({i},{i}): {a1[i, i]}", end="; ")
# and with negative indices
print("\n\n")
for i in range(0, 3):
    print(f"({i},{-(i+1)}): {a1[i, -(i + 1)]}", end="; ")

(0,0): 1; (1,1): 2; (2,2): 9; 
(0,-1): 7; (1,-2): 2; (2,-3): 6;

You can modify a given element using any of the above indexing approaches.

In [4]:


for i in range(0, 3):
    a1[i, -(i + 1)] = 42.6
print(a1)

[[ 1  3 42]
 [ 6 42  1]
 [42  8  9]]

Did you notice that the value was truncated because the array has values of type integer. No rounding!

Array Slices/Sub-arrays

Once again, just like Python [start=0:stop='size of vector/dimension':step=1] will work just fine.

There is one important consideration when working with NumPy sub-arrays. Unlike Python, where a slice returns a copy, NumPy returns a view of the array/vector data. So changing the value of a sub-array element will change the value in the original array/vector.

Unless we make sure to ask for a copy.

In [5]:

# To make things clearer let's start with a new vector
v2 = np.arange(10)
print(v2)

[0 1 2 3 4 5 6 7 8 9]

In [6]:

# first 3 elements
print(v2[:3])
# elements with index 6 or greater
print(v2[6:])
# values at indices 4 to 6 inclusive, recall 'stop' is not included
print(v2[4:7])
# how about a negative step value
print(v2[6:3:-1])
# every other element starting at index 1
print(v2[1::2])
# as with Python lists that negative step value can be used to reverser the vector
print(v2[::-1])

[0 1 2]
[6 7 8 9]
[4 5 6]
[6 5 4]
[1 3 5 7 9]
[9 8 7 6 5 4 3 2 1 0]

Now let's look at an array. We'll need something a bit bigger than our first one above.

In [7]:


a2 = np.random.randint(1, 10, (4, 5))
print(a2)

[[7 1 5 9 7]
 [8 4 2 6 5]
 [2 1 3 8 6]
 [2 9 3 8 3]]

In [8]:

# let's get the 2x3 array out of the middle
a_mid = a2[1:3, 1:4]
print(a_mid)

[[4 2 6]
 [1 3 8]]

In [9]:

# how about all rows, every other column
a_23 = a2[:4, ::2]
print(a_23)

[[7 5 7]
 [8 2 5]
 [2 3 6]
 [2 3 3]]

In [10]:

# reversing can also work, here we reverse the rows and the columns
a_23r = a_23[::-1, :3]
print(a_23r)
# but we could just as easily reversed the rows and the columns together
print()
a_23r2 = a_23[::-1, ::-1]
print(a_23r2)

[[2 3 3]
 [2 3 6]
 [8 2 5]
 [7 5 7]]
[[3 3 2]
[6 3 2]
[5 2 8]
[7 5 7]]

As you might expect, it will be a common need to access a single row or column.

In [11]:


print(f"first column of a2: {a2[:, 0]}")
# or the first row
print(f"\nfirst row of a2: {a2[0, :]}")
# for rows, we can actually drop the empty slice syntax
print(f"\nfirst row of a2: {a2[0]}")

first column of a2: [7 8 2 2]
first row of a2: [7 1 5 9 7]
first row of a2: [7 1 5 9 7]

Now a quick look at view versus copy.

In [12]:


a_2_2 = a2[1:3, 2:4]
print(a_2_2)
# let's change one of the elements
a_2_2[1,1] = 99
print(f"\na_2_2: {a_2_2}")
print(f"\na2: {a2}")

[[2 6]
 [3 8]]
a_2_2: [[ 2  6]
[ 3 99]]
a2: [[ 7  1  5  9  7]
[ 8  4  2  6  5]
[ 2  1  3 99  6]
[ 2  9  3  8  3]]

However, sometimes we will really want a copy. NumPy provides a method for that.

In [13]:

# getting a copy rather than a view
a_2_2_cp = a2[1:3, 2:4].copy()
print(a_2_2_cp)
# let's change one of the elements
a_2_2_cp[1,1] = 8
print(f"\na_2_2_cp: {a_2_2_cp}")
print(f"\na2: {a2}")

[[ 2  6]
 [ 3 99]]
a_2_2_cp: [[2 6]
[3 8]]
a2: [[ 7  1  5  9  7]
[ 8  4  2  6  5]
[ 2  1  3 99  6]
[ 2  9  3  8  3]]

Basic Manipulation of Arrays

Shape

We didn’t get to it earlier, but NumPy arrays have a couple properties of interest: ndim and shape. ndim returns the number of dimensions and shape returns the size of each dimension (tuple).

It is possible with NumPy to change the shape of a vector or array. This is apparently something often done/required in data science.

Be aware that where possible reshape will return a view. Though if that is not possible, it will return a copy. Will have to see if there is a way to tell? If not, just assume it is a view.

In [14]:

# let's reshape a new vector into a 3x3 array
v3 = np.arange(1, 19, 2)
print(f"v3: {v3}")
v3_3 = v3.reshape((3, 3))
print(f"\nv3_3:\n{v3_3}")
print(f"\nv3.size: {v3.size}; v3_3.size: {v3_3.size}\n")
# do note, v3.size must equal v3_3.size
v3_4 = v3.reshape((3, 4))

v3: [ 1  3  5  7  9 11 13 15 17]
v3_3:
[[ 1  3  5]
[ 7  9 11]
[13 15 17]]
v3.size: 9; v3_3.size: 9

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-726eef36ae04> in <module>
      6 print(f"\nv3.size: {v3.size}; v3_3.size: {v3_3.size}\n")
      7 # do note, v3.size must equal v3_3.size
----> 8 v3_4 = v3.reshape((3, 4))
ValueError: cannot reshape array of size 9 into shape (3,4)

In [15]:

# how about v1 as the first column of an array (column vector)
v1_rv = v1.reshape(1, v1.size)
print(f"\nv1: {v1}")
print(f"\nv1_rv: {v1_rv}")
# how about the first column of an array (column vector)
v1_cv = v1.reshape(v1.size, 1)
print(f"\nv1_cv:\n{v1_cv}")
# there is also a keyword, newaxis, that works for the above two cases
v1_rv_2 = v1[np.newaxis, :]
print(f"\nv1_rv_2: {v1_rv_2}")
v1_cv_2 = v1[:, np.newaxis]
print(f"\nv1_cv_2:\n{v1_cv_2}")

v1: [4 5 6 1 6 5 3]
v1_rv: [[4 5 6 1 6 5 3]]
v1_cv:
[[4]
[5]
[6]
[1]
[6]
[5]
[3]]
v1_rv_2: [[4 5 6 1 6 5 3]]
v1_cv_2:
[[4]
[5]
[6]
[1]
[6]
[5]
[3]]

Concatentation and Splitting of Arrays

We until now have been manipulating individual arrays. Before calling it a day, let’s look a some basic manipulation we can do with 2 arrays.

In [16]:

# let's look at combining arrays, concatenation
# three methods available: np.concatenate, np.vstack, np.hstack
v7 = np.array([1, 2, 3])
v8 = np.array([4, 5, 6])
v9 = np.array([42, 42, 42])
# first argument is a tuple or list of arrays
print(np.concatenate([v7, v8]))
print(np.concatenate([v7, v8, v9]))

[1 2 3 4 5 6]
[ 1  2  3  4  5  6 42 42 42]

In [17]:

# works with two-dimensional arrays
a7 = np.array([v7, v8])
print("\n", a7)
# concatenate along the row axis
print("\n", np.concatenate([a7, a7]))
# or along the column axis
print("\n", np.concatenate([a7, a7], axis=1))

 [[1 2 3]
 [4 5 6]]
[[1 2 3]
[4 5 6]
[1 2 3]
[4 5 6]]
[[1 2 3 1 2 3]
[4 5 6 4 5 6]]

In [18]:

# when working with arrays of varying dimensions
# you may prefer to use one of the other methods
# N.B. shape and size are relevant
a8 = np.array([7, 8, 9])
# verically, i.e. row order
print(np.vstack([a8, a7]))
# horizontally, i.e. column order,
a9 = a8[:2].reshape((2, 1))
print("\n", np.hstack([a7, a9]))

[[7 8 9]
 [1 2 3]
 [4 5 6]]
[[1 2 3 7]
[4 5 6 8]]

In [19]:

# now let's look at splitting arrays
# again three methods: np.split, np.hsplit, np.vsplit
# we pass a list of split points to each,
# nbr of resulting arrays is nbr of split points + 1
v10 = np.concatenate([v7, v8, v9])
v11, v12, v13 = np.split(v10, [4, 7])
print("\n", v11,"\n", v12, "\n", v13)

 [1 2 3 4] 
 [ 5  6 42] 
 [42 42]

In [20]:

# now the other two functions
matrix = np.arange(16).reshape((4, 4))
print(matrix)
m_up, m_dn = np.vsplit(matrix,[2])
m_lf, m_rt = np.hsplit(matrix,[2])
print("\n\nm_up:\n", m_up,"\n\nm_dn:\n", m_dn, "\n\nm_lf:\n", m_lf, "\n\nm_rt:\n", m_rt)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
m_up:
[[0 1 2 3]
[4 5 6 7]]
m_dn:
[[ 8  9 10 11]
[12 13 14 15]]
m_lf:
[[ 0  1]
[ 4  5]
[ 8  9]
[12 13]]
m_rt:
[[ 2  3]
[ 6  7]
[10 11]
[14 15]]

Time for a Break

Okay, that’s it for this one. Will see what we get up to next time.

The example notebook for this one is available for download.

Too Old To Code

Data Science Basics: NumPy — Array Basics