Tuesday, August 4, 2020

Array Splitting

Array Splitting


The .array_split() is better than the .split() when handling remainders, meaning the number of split bins not evenly divisible into the number of rows available when axis=0, OR not evenly divisible into the number of columns available when axis=1.

1 dimensional examples (axis=1 undefined):

# 3 evenly divisible into #columns=6

import numpy as np

oneD = np.array([123456]) # row array

split_arr = np.array_split(oneD, 3, axis=0)

print(newarr)

# results in:

[array([1, 2]), array([3, 4]), array([5, 6])]

# 3 evenly divisible into #columns=6

import numpy as np

oneD = np.array([123456]) # row array

split_arr = np.split(oneD, 3, axis=0)

print(newarr)

# results same as for .array_split() in:

[array([1, 2]), array([3, 4]), array([5, 6])]


# column arrays, 3 evenly divisible into #rows=6

import numpy as np

oneD = np.array([[1], [2], [3], [4], [5], [6]])

split_arr = np.array_split(oneD, 3, axis=0)

print(split_arr)

[array([[1],
       [2]]), array([[3],
       [4]]), array([[5],
       [6]])]


# column arrays, 3 evenly divisible into #rows=6

import numpy as np

oneD = np.array([[1], [2], [3], [4], [5], [6]])

split_arr = np.split(oneD, 3, axis=0)

print(split_arr)

[array([[1],
       [2]]), array([[3],
       [4]]), array([[5],
       [6]])]


However, watch what happens when the number of columns (or rows) is not evenly divisible by the split factor parameter.

# column arrays, 4 NOT evenly divisible into #rows=6

import numpy as np

oneD = np.array([[1], [2], [3], [4], [5], [6]])

split_arr = np.split(oneD, 4, axis=0)

print(split_arr)

ValueError: array split does not result in an equal division

Whereas .array_split() can handle it without an error message.

# column arrays, 4 NOT evenly divisible into #rows=6

import numpy as np

oneD = np.array([[1], [2], [3], [4], [5], [6]])

split_arr = np.array_split(oneD, 4, axis=0)

print(split_arr)

# 2 sub- arrays with 2 elements and 2 sub-arrays with one element
# gives 6 elements total.

[array([[1],
       [2]]), array([[3],
       [4]]), array([[5]]), array([[6]])]

--------------------------------------

2 dimensional examples:

# rows = 2 evenly divisible by 2

import numpy as np

twoD = np.array([[1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66]])

split_arr = np.array_split(twoD, 2, axis = 1)

print(split_arr)


[array([[ 1,  2,  3],
       [11, 22, 33]]), array([[ 4,  5,  6],
       [44, 55, 66]])]



# rows = 3 NOT evenly divisible by 2
import numpy as np

twoD = np.array([[1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66], [101, 202, 303, 404, 505, 606]])

split_arr = np.array_split(twoD, 2, axis = 0)

print(split_arr)


[array([[ 1,  2,  3,  4,  5,  6],
       [11, 22, 33, 44, 55, 66]]), array([[101, 202, 303, 404, 505, 606]])]



# columns = 3 NOT evenly divisible by 2
import numpy as np

twoD = np.array([[1, 2, 3], [11, 22, 33], [101, 202, 303]])

split_arr = np.array_split(twoD, 2, axis = 1)

print(split_arr)


[array([[  1,   2],
       [ 11,  22],
       [101, 202]]), array([[  3],
       [ 33],
       [303]])]


As you can see from the examples above, in the case of remainders, arrays that fully satisfy the divisor are created in preference from left to right. Otherwise, by using .split() you'll get an error message that says "array split does not result in an equal division". 

If you're using the .array_split() it returns an array of arrays, but it will also work with what's called a "ragged" matrix or a ragged array where at least two rows or at least two columns have unequal lengths compared to each other. Use the optional argument of dtype to get it to work.

import numpy as np
# The 3rd row has 4 columns
ragged_array = np.array([[7, 8, 9], [10, 11, 12], [13, 14, 15, 14]], dtype=object)

fixed_arr = np.array_split(ragged_array, 3, axis=0)

print(fixed_arr)

[array([[7, 8, 9]], dtype=object), array([[10, 11, 12]], dtype=object), array([[13, 14, 15, 14]], dtype=object)]


When you're first creating the array, but before you split it, the optional keyword dtype =object will get the array stored as pointers. This allows you to do certain things, for example, to handle arrays that are unequal in both the column and row directions as shown above. 

The optional axis keyword is allowed; the default is axis=0. axis=1 is only valid for arrays of 2-D and higher.

Analogous to the joining commands, python has an .hsplit() a .vsplit() in a .dsplit() member function. .hsplit() is sort of like .dsplit() except the latter gives you an extra dimension. In the case of .hsplit(), .vsplit() or .dsplit(), an array with ragged dimensions will be treated as if it had an extra dimension. Therefore you can coax .hsplit(), .vsplit() or .dsplit() into working on ragged arrays by adding an extra pair of parentheses around the entire array argument. Axis parameter is undefined for these three.


# Normal Arrays

.hsplit()

                            Think about #columns vs divisor


import numpy as np

hsplit_arr = np.array([[1, 2, 3], [4, 5, 6],  [13, 14, 15], [16, 17, 18]])

newarray = np.hsplit(hsplit_arr, 3)

print(newarray)


[array([[ 1],
       [ 4],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [15],
       [18]])]


.vsplit()

Think about #rows vs. divisor

import numpy as np

hsplit_arr = np.array([[1, 2, 3], [4, 5, 6],  [13, 14, 15], [16, 17, 18]])

newarray = np.vsplit(hsplit_arr, 2)

print(newarray)


[array([[1, 2, 3],
       [4, 5, 6]]), array([[13, 14, 15],
       [16, 17, 18]])]


.dsplit()

Think about #columns vs divisor

# notice the extra pair of [] around the entire argument

import numpy as np

arr = np.array([[[1, 2, 3], [44, 55, 66], [77, 88, 999], [10, 11, 120]]])

newarr = np.dsplit(arr, 3)

print(newarr)

[array([[[ 1],
        [44],
        [77],
        [10]]]), array([[[ 2],
        [55],
        [88],
        [11]]]), array([[[  3],
        [ 66],
        [999],
        [120]]])]


#####################  RAGGED ARRAY  #############

import numpy as np

dsplit_arr = np.array([[[1, 2, 3], [4, 5, 6],  [1, 13, 14, 15], [16, 17, 18]]], dtype=object)

newarray = np.dsplit(dsplit_arr, 4)

print(newarray)

ValueError: dsplit only works on arrays of 3 or more dimensions


# Now add extra [] to coax

import numpy as np

# the 3d row has 4 columns vs. 3 columns for the other rows
dsplit_arr = np.array([[[[1, 2, 3], [4, 5, 6],  [1, 13, 14, 15], [16, 17, 18]]]], dtype=object)

newarray = np.dsplit(dsplit_arr, 4)

print(newarray)

[array([[[[1, 2, 3]]]], dtype=object), array([[[[4, 5, 6]]]], dtype=object), array([[[[1, 13, 14, 15]]]], dtype=object), array([[[[16, 17, 18]]]], dtype=object)]











No comments:

Post a Comment