Wednesday, August 12, 2020

Numpy array filtering

 Numpy array filtering


Filtering is essentially done using a Boolean mask array versus a Target array.

import numpy as np

target = np.array([1, 5, 9, 4, 0])

filter_array = [True, False, False, True, True] # mask array

print(target[filter_array])


# result is:

[1 4 0]


The Mask array can either be longer or shorter than the Target array; you won't get any warnings or errors. Everything will be executed left to right. If a part of the mask is not needed, it will be ignored. If any section of the target array is not needed on the right hand side, it'll be ignored.


A possibly non-obvious and/or hidden feature: Use integers to point to the indices to keep, including repeats!!


import numpy as np

target = np.array([1, 5, 9, 4, 0])

filter_array = [0, 1, 1, 1, 0]

print(target[filter_array])


# result is, keep the 0th and 1st index positions' values:


[1 5 5 5 1]


The trick is to set your filter to a default list (which can be an empty list) if no conditions are met vs. the mask array. At an intermediate stage, even other code command statements could populate the filter before the mask gets a chance to hit it.


numpy considers zero to be an even number.


import numpy as np


target_array = np.array([123, 5, -6, 7, 98])

filter_array = [] # could be populated with default values too



for num in target_array:

  if num > 52:

    filter_array.append(True)

  else:

    filter_array.append(False)


modified_array = target_array[filter_array]


print(filter_array)

print(modified_array)


# result is:

[True, False, False, False, True]
[123  98]

A short hand trick to save coding lines is to directly insert the condition in the indices bracketed on the Target array.


import numpy as np

array0 = np.array([-1, -2, -3, -4])


filtarray = array0 > -4


newarray = arr[filtarray]

print(filtarray)
print(newarray)


# result is 


[ True  True  True False]

[-1 -2 -3]


However, many efforts on my part to do compound conditional statements gave errors. The only way I see to do it is to create extra arrays and extra filters, with an example as shown below.


import numpy as np


array0 = np.array([-1, -2, -3, -4])


filtarray1 = []

filtarray2 = []


filtarray1 = array0 > -3


array1 = array0[filtarray1]


filtarray2 = array1 < -1


array2 = array1[filtarray2]


print(array2)


# result is

[-2]











Tuesday, August 11, 2020

Array Sorting

 Array Sorting


You can even do a sort on an array of Booleans, and all the False values will appear before even the first of the Trues.

The original array is unchanged and .sort() makes a copy so be prepared to accept a return value.


.sort(a, axis, kind, order) as a numpy method


a: array to be sorted

axis: default is -1, which is the last axis and is always the fastest to sort on. Specifying  'None' flattens the 2-D array before sorting.

kind: your choice of sorting algorithm.


order: a complex argument that depends on the structured array and definitions in a list of the rank of non-numerical fields.


See the official page for numpy.sort() to learn more and get all your many parameter choices.


import numpy as np


a = np.array([[3, 4, -2],[2, 11, 8]])


b = np.sort(a, axis=-1, kind='heapsort')    

# sort along the last axis

print(b)

c = np.sort(a, axis=None)   # sort the flattened array

print(c)

d = np.sort(a, axis=0)      # sort along the first axis

print(d)


# results in:

#a

[[-2  3  4]

 [ 2  8 11]]


#b

[-2  2  3  4  8 11]


#c

[[ 2  4 -2]

 [ 3 11  8]]


If you don't want to make a copy by sorting, use the sort-in-place method numpy.ndarray.sort().


import numpy as np

unordered_array = np.array([13, 17, -9, 101])

newarray = np.ndarray.sort(unordered_array)

print(unordered_array)

print(newarray)

# results in, with no array returned from ndarray.sort():


[ -9  13  17 101]

None








Friday, August 7, 2020

Array Searching

 Array Searching


Use the .where() method of numpy to return indices that match a certain condition.


import numpy as np

numpyarray = np.array([5, 6, 7, 8, 10])

bigger = np.where(numpyarray == 8)


print(bigger)

# result is:

(array([3]),)


because the 3rd index has a value equal to 8. Remember counting starts at 0.



What about if there are multiple matches?


import numpy as np

numpyarray = np.array([5, 6, 7, 8, 9, 10])

bigger = np.where(numpyarray > 7)


print(bigger)

# result is:

(array([3, 4, 5]),)


because the 3rd, 4th and 5th indices have values greater than 7.


What about if there are no matches?


import numpy as np

numpyarray = np.array([5, 6, 7, 8, 9, 10])

bigger = np.where(numpyarray < -4)


print(bigger)

# result is an empty array:


(array([], dtype=int64),)


Use .searchsorted() to find out where the proper position/index should be to insert a new element in such a way that the order is not destroyed. This method only works on an array that is already sorted to start with.


The default is to search for the index that is to the left of the first value that would destroy the order, imagining that the parameter value was to be inserted there.


import numpy as np

sortedarray = np.array([16, 27, 38, 49])

x = np.searchsorted(
sortedarray, 33

# you can use side = 'left' for clarity although it is the default.

print(x)

# result is:

2


because 33 belongs in the index spot 2, because if you put 33 into index spot 3, that would put the 38 to the left of it (33), out of order.


Although we still always move from left to right across the face of the target array, using the optional side = 'right' argument will instead find the index position to the right of where the new inserted value should be and still not destroy the order. Notice that this only produces a different result than if side was set to 'left' IF ONE OF THE ELEMENT VALUES IN THE TARGET LIST IS EQUAL TO THE NEW VALUE TO INSERT.


import numpy as np

sortedarray = np.array([16, 27, 38, 49])

x = np.searchsorted(
sortedarray, 33, side = 'left')


print(x)

# result is:

2


Same result if side = 'right'.


import numpy as np

sortedarray = np.array([16, 27, 38, 49])

x = np.searchsorted(
sortedarray, 33, side = 'right')


print(x)

# result is:

2


import numpy as np

sortedarray = np.array([16, 27, 33, 49])

x = np.searchsorted(
sortedarray, 33, side = 'right')

print(x)

# result is:

3

because 49 at index 3 is to the right of the equaled 33 value that matched in the search. It is almost like that the new 33 was superimposed onto the current 33.



You can feed a list of new elements to .searchsorted() and it returns indices as if you had simultaneously dropped in all the new elements at once, meaning the target array is not in any intermediate state as each value could have dropped in one-by-one.


import numpy as np


array1 = np.array([1, 33, 25, 74])


x = np.searchsorted(array1, [22, 44, 66])


print(x)


# result is:


[1 3 3]


because 22 is in between 1 and 33 so it gets the index 1 position. Both 44 and 66 are in between 25 and 74. so each would get the same index 3 position, even though 66 is greater than 44. The method acts as if the target array is STATIC and values can "piled on top of each other" at the same index position.


With the side = 'right' optional argument into the .searchsorted() method, if the argument value is equal to or greater than the rightmost element of the target array (the largest value of the target array) then it will return an index representing what would be position to the right. In other words, the array could have potentially grown bigger by putting in your new element which is larger than any present element. Another way to think about this is if the array has grown or stretched out, and you are creating a new index all the way to the right that is not currently part of the target array.


import numpy as np


stretcharray = np.array([1, 33, 25, 74])


x = np.searchsorted(stretcharray, 88)


print(x)


# result is:


4


because 88 would need to appear in a new index position of 4 to keep the array ordered.








Tuesday, August 4, 2020

Array Splitting

Array Splitting


The .array_split() is better than the .split() when handling remainders, meaning the number of split bins not evenly divisible into the number of rows available when axis=0, OR not evenly divisible into the number of columns available when axis=1.

1 dimensional examples (axis=1 undefined):

# 3 evenly divisible into #columns=6

import numpy as np

oneD = np.array([123456]) # row array

split_arr = np.array_split(oneD, 3, axis=0)

print(newarr)

# results in:

[array([1, 2]), array([3, 4]), array([5, 6])]

# 3 evenly divisible into #columns=6

import numpy as np

oneD = np.array([123456]) # row array

split_arr = np.split(oneD, 3, axis=0)

print(newarr)

# results same as for .array_split() in:

[array([1, 2]), array([3, 4]), array([5, 6])]


# column arrays, 3 evenly divisible into #rows=6

import numpy as np

oneD = np.array([[1], [2], [3], [4], [5], [6]])

split_arr = np.array_split(oneD, 3, axis=0)

print(split_arr)

[array([[1],
       [2]]), array([[3],
       [4]]), array([[5],
       [6]])]


# column arrays, 3 evenly divisible into #rows=6

import numpy as np

oneD = np.array([[1], [2], [3], [4], [5], [6]])

split_arr = np.split(oneD, 3, axis=0)

print(split_arr)

[array([[1],
       [2]]), array([[3],
       [4]]), array([[5],
       [6]])]


However, watch what happens when the number of columns (or rows) is not evenly divisible by the split factor parameter.

# column arrays, 4 NOT evenly divisible into #rows=6

import numpy as np

oneD = np.array([[1], [2], [3], [4], [5], [6]])

split_arr = np.split(oneD, 4, axis=0)

print(split_arr)

ValueError: array split does not result in an equal division

Whereas .array_split() can handle it without an error message.

# column arrays, 4 NOT evenly divisible into #rows=6

import numpy as np

oneD = np.array([[1], [2], [3], [4], [5], [6]])

split_arr = np.array_split(oneD, 4, axis=0)

print(split_arr)

# 2 sub- arrays with 2 elements and 2 sub-arrays with one element
# gives 6 elements total.

[array([[1],
       [2]]), array([[3],
       [4]]), array([[5]]), array([[6]])]

--------------------------------------

2 dimensional examples:

# rows = 2 evenly divisible by 2

import numpy as np

twoD = np.array([[1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66]])

split_arr = np.array_split(twoD, 2, axis = 1)

print(split_arr)


[array([[ 1,  2,  3],
       [11, 22, 33]]), array([[ 4,  5,  6],
       [44, 55, 66]])]



# rows = 3 NOT evenly divisible by 2
import numpy as np

twoD = np.array([[1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66], [101, 202, 303, 404, 505, 606]])

split_arr = np.array_split(twoD, 2, axis = 0)

print(split_arr)


[array([[ 1,  2,  3,  4,  5,  6],
       [11, 22, 33, 44, 55, 66]]), array([[101, 202, 303, 404, 505, 606]])]



# columns = 3 NOT evenly divisible by 2
import numpy as np

twoD = np.array([[1, 2, 3], [11, 22, 33], [101, 202, 303]])

split_arr = np.array_split(twoD, 2, axis = 1)

print(split_arr)


[array([[  1,   2],
       [ 11,  22],
       [101, 202]]), array([[  3],
       [ 33],
       [303]])]


As you can see from the examples above, in the case of remainders, arrays that fully satisfy the divisor are created in preference from left to right. Otherwise, by using .split() you'll get an error message that says "array split does not result in an equal division". 

If you're using the .array_split() it returns an array of arrays, but it will also work with what's called a "ragged" matrix or a ragged array where at least two rows or at least two columns have unequal lengths compared to each other. Use the optional argument of dtype to get it to work.

import numpy as np
# The 3rd row has 4 columns
ragged_array = np.array([[7, 8, 9], [10, 11, 12], [13, 14, 15, 14]], dtype=object)

fixed_arr = np.array_split(ragged_array, 3, axis=0)

print(fixed_arr)

[array([[7, 8, 9]], dtype=object), array([[10, 11, 12]], dtype=object), array([[13, 14, 15, 14]], dtype=object)]


When you're first creating the array, but before you split it, the optional keyword dtype =object will get the array stored as pointers. This allows you to do certain things, for example, to handle arrays that are unequal in both the column and row directions as shown above. 

The optional axis keyword is allowed; the default is axis=0. axis=1 is only valid for arrays of 2-D and higher.

Analogous to the joining commands, python has an .hsplit() a .vsplit() in a .dsplit() member function. .hsplit() is sort of like .dsplit() except the latter gives you an extra dimension. In the case of .hsplit(), .vsplit() or .dsplit(), an array with ragged dimensions will be treated as if it had an extra dimension. Therefore you can coax .hsplit(), .vsplit() or .dsplit() into working on ragged arrays by adding an extra pair of parentheses around the entire array argument. Axis parameter is undefined for these three.


# Normal Arrays

.hsplit()

                            Think about #columns vs divisor


import numpy as np

hsplit_arr = np.array([[1, 2, 3], [4, 5, 6],  [13, 14, 15], [16, 17, 18]])

newarray = np.hsplit(hsplit_arr, 3)

print(newarray)


[array([[ 1],
       [ 4],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [15],
       [18]])]


.vsplit()

Think about #rows vs. divisor

import numpy as np

hsplit_arr = np.array([[1, 2, 3], [4, 5, 6],  [13, 14, 15], [16, 17, 18]])

newarray = np.vsplit(hsplit_arr, 2)

print(newarray)


[array([[1, 2, 3],
       [4, 5, 6]]), array([[13, 14, 15],
       [16, 17, 18]])]


.dsplit()

Think about #columns vs divisor

# notice the extra pair of [] around the entire argument

import numpy as np

arr = np.array([[[1, 2, 3], [44, 55, 66], [77, 88, 999], [10, 11, 120]]])

newarr = np.dsplit(arr, 3)

print(newarr)

[array([[[ 1],
        [44],
        [77],
        [10]]]), array([[[ 2],
        [55],
        [88],
        [11]]]), array([[[  3],
        [ 66],
        [999],
        [120]]])]


#####################  RAGGED ARRAY  #############

import numpy as np

dsplit_arr = np.array([[[1, 2, 3], [4, 5, 6],  [1, 13, 14, 15], [16, 17, 18]]], dtype=object)

newarray = np.dsplit(dsplit_arr, 4)

print(newarray)

ValueError: dsplit only works on arrays of 3 or more dimensions


# Now add extra [] to coax

import numpy as np

# the 3d row has 4 columns vs. 3 columns for the other rows
dsplit_arr = np.array([[[[1, 2, 3], [4, 5, 6],  [1, 13, 14, 15], [16, 17, 18]]]], dtype=object)

newarray = np.dsplit(dsplit_arr, 4)

print(newarray)

[array([[[[1, 2, 3]]]], dtype=object), array([[[[4, 5, 6]]]], dtype=object), array([[[[1, 13, 14, 15]]]], dtype=object), array([[[[16, 17, 18]]]], dtype=object)]











Monday, August 3, 2020

Joining Arrays

Joining Arrays

-----------------------------

Joining 1-D arrays

X
Y

concatenate((X,Y), axis = 0)

yields:

[X][Y]   as a 1-D array row vector

using axis = 1 generates an error because the .concatenate() method can not produce a new dimension, and axis = 1 requires a 2nd dimension to already exist.

IndexError: axis 1 out of bounds [0, 1)

Also note that Python 1-D arays are defaulted as row vectors when first explicitly constructed. You can make it a column vector by doing:

col_array = np.array([[1], [1]])
print(col_array)

[[1]
 [1]]

although that may look a bit ugly with the extra braces, and a check on the .ndim member property of the array class will say 2 dimensions.

The two 1-D arrays being concatenated do NOT need to have the same lengths.

Example:

import numpy as np

# ROW vectors

X = np.array([1, 2, 3])

Y = np.array([4, 5, 6, 7])

XYarray = np.concatenate((X, Y), axis=0)

print(XYarray)

[1 2 3 4 5 6 7]


import numpy as np

# COLUMN vectors

X = np.array([[1], [2], [3]])

Y = np.array([[4], [5], [6], [7]])

XYarray = np.concatenate((X, Y), axis=0)

print(XYarray)

[[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]]



-------------------------------------------------


Joining 2-D arrays

X = [[A], [B]]
Y = [[C], [D]]

concatenate((X,Y), axis = 1)

yields:

[A][C]
[B][D]   as a 2-D array

define a submatrix to be any quantity passed in as an argument to .concatenate()
X and Y are each a submatrix.

number of rows = #rows in any one submatrix
number of columns = #submatrices * #columns in a submatrix

important: [A] [C] [B] [D] must all have the same dimensions.

W = [[E], [F]]
Z = [[G], [H]]

-------------------------------------------------

concatenate((W,Z), axis = 0)

yields:

[E]
[F]
[G]
[H]   as a 2-D array

define a submatrix to be any quantity passed in as arguments to .concatenate()
W and Z are each a submatrix.

number of rows = #submatrices * #rows in a submatrix 
number of columns = #columns in any one submatrix

important: [E] [F] [G] [H] must all have the same dimensions.

################################################################################

Examples:

import numpy as np

X = np.array([[1, 2], [3, 4]]) 

X = [1 2]
    [3 4]

# A = [1, 2] and B = [3, 4] so X = [A, B]


Y = np.array([[5, 6], [7, 8]]) 

Y= [5 6]
   [7 8]

# C = [5, 6] D = [7, 8] so Y = [C, D]

axis1_array = np.concatenate((X, Y), axis=1)

print(axis1_array)

[[1 2 5 6]
 [3 4 7 8]]

which looks like:

[A][C]
[B][D]   as a 2-D array


Using terms from linear algebra, one can think of an axis=1 concantenate operation on 2-D arrays as taking the transpose of each each subsection of X (A, B) and each subsection of Y (C, D) and joining them side-by-side horizontally.

----------------------------------------------
----------------------------------------------

import numpy as np

W = np.array([[1, 2, 3], [4, 5, 6]]) 
# E = [1, 2, 3] and F = [4, 5, 6] so W = [E, F]


Z = np.array([[7, 8, 9], [10, 11, 12]]) 
# G = [7, 8, 9] H = [10, 11, 12] so Z = [G, H]

axis0_array = np.concatenate((W,Z), axis=0)

print(axis0_array)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

which looks like:

[E]
[F]
[G]
[H]   as a 2-D array


Using terms from linear algebra, one can think of an axis=0 concatenate operation on 2-D arrays as taking the transpose of each subsection of W (E, F) and each subsection of Z (G, H) and joining them top-to-bottom vertically.

Stacking 1-D arrays

the .stack() method, differs from the .concatenate() method is that it creates a new dimension.

axis = 0 Example:

import numpy as np

X = np.array([1, 2, 3, 4])

Y = np.array([5, 6, 7, 8])

twoDarray_axis0 = np.stack(X, Y)

print(twoDarray_axis0)


The individual row arrays X and Y are connected (left-to-right) into [top-to-bottom] rows like "flat stacked" plates in the cupboard.

In linear algebra terms, if each row is 1 by n, then m rows will yield a m by n matrix.


axis = 1 Example:

import numpy as np

A = np.array([1, 2, 3, 4])

B = np.array([5, 6, 7, 8])

twoarray_axis1 = np.stack((A, B), axis=1)

print(twoarray_axis1)

The individual arrays A and B are connected (left-to-right) into side-by-side [left-to-right] like "side rack stacked" plates in the cupboard.

In linear algebra terms, if each transposed row is now a n by 1 column array, then m rows will yield a n by m matrix.



------------------------


.hstack() stitches together horizontally with array variables glued together reading from left to right in the argument list going left-to-right.

.hstack() increases COLUMNS

arrays passed into .hstack() must have the same dimensions.

INSIGHT:

.hstack() on 1-D arrays does the same thing as .concatenate() with axis=0

.hstack() on 2-D arrays does the same thing as .concatenate() with axis=1


Example:

import numpy as np

C = np.array([[1, 2, 3], [1, 2, 3]] ) # 2 rows and 3 columns

D = np.array([[4, 5, 6], [4, 5, 6]])  # 2 rows and 3 columns

CDarray = np.hstack((C, D))

print(CDarray)

[[1 2 3 4 5 6]                        # 2 rows and 6 columns
 [1 2 3 4 5 6]]

--------------------------------------

.vstack() stitches together vertically with array variables glued together reading from left to right in the argument list going top-to-bottom.

.vstack() increases ROWS

INSIGHT:   .vstack() of 1-D row arrays is the same as .stack() of 1-D row arrays with axis=0

arrays passed into .vstack() must have the same dimensions

Example:

import numpy as np

K = np.array([[1, 2, 3], [7, 8 ,9]]) # 2 rows and 3 columns

L = np.array([[4, 5, 6], [7, 8 ,9]]) # 2 rows and 3 columns

KLarray = np.vstack((K, L))

print(KLarray)                       # 4 rows and 3 columns

[[1 2 3]
 [7 8 9]
 [4 5 6]
 [7 8 9]]



-----------------------------------

.dstack() takes 1-D row arrays, transposes them into 1-D column arrays, and then stitches them side-by-side left-to-right as they appear in the argument list passed into .dstack()

import numpy as np

R = np.array([1, 2, 3])

S = np.array([4, 5, 6])

RSarray = np.dstack((R, S))

print(RSarray)

[[[1 4]
  [2 5]
  [3 6]]]

notice the 3rd set of square braces; an extra dimension has been added of length 1 (a 3-D array)

------------------------------------

.dstack() takes 2-D arrays, flattens them into 1-D row arrays, transposes them into 1-D column arrays, and then stitches them side-by-side left-to-right as they appear in the argument list passed into .dstack()


Example:

import numpy as np

P = np.array([[1, 2, 3], [1, 2, 3]])

Q = np.array([[4, 5, 6], [11, 22, 33]])

PQarray = np.dstack((P, Q))

print(PQarray)

[[[ 1  4]
  [ 2  5]
  [ 3  6]]

 [[ 1 11]
  [ 2 22]
  [ 3 33]]]

notice the 3rd set of square braces; an extra dimension has been added of length 1 (a 3-D array)