Array Splitting
The .array_split() is better than the .split() when handling remainders, meaning the number of split bins not evenly divisible into the number of rows available when axis=0, OR not evenly divisible into the number of columns available when axis=1.
1 dimensional examples (axis=1 undefined):
# 3 evenly divisible into #columns=6
import numpy as np
oneD = np.array([1, 2, 3, 4, 5, 6]) # row array
split_arr = np.array_split(oneD, 3, axis=0)
print(newarr)
oneD = np.array([1, 2, 3, 4, 5, 6]) # row array
split_arr = np.array_split(oneD, 3, axis=0)
print(newarr)
# results in:
[array([1, 2]), array([3, 4]), array([5, 6])]
# 3 evenly divisible into #columns=6
import numpy as np
oneD = np.array([1, 2, 3, 4, 5, 6]) # row array
split_arr = np.split(oneD, 3, axis=0)
print(newarr)
oneD = np.array([1, 2, 3, 4, 5, 6]) # row array
split_arr = np.split(oneD, 3, axis=0)
print(newarr)
# results same as for .array_split() in:
[array([1, 2]), array([3, 4]), array([5, 6])]
# column arrays, 3 evenly divisible into #rows=6
import numpy as np
oneD = np.array([[1], [2], [3], [4], [5], [6]])
split_arr = np.array_split(oneD, 3, axis=0)
print(split_arr)
[array([[1],
[2]]), array([[3],
[4]]), array([[5],
[6]])]
# column arrays, 3 evenly divisible into #rows=6
import numpy as np
oneD = np.array([[1], [2], [3], [4], [5], [6]])
split_arr = np.split(oneD, 3, axis=0)
print(split_arr)
[array([[1],
[2]]), array([[3],
[4]]), array([[5],
[6]])]
However, watch what happens when the number of columns (or rows) is not evenly divisible by the split factor parameter.
# column arrays, 4 NOT evenly divisible into #rows=6
import numpy as np
oneD = np.array([[1], [2], [3], [4], [5], [6]])
split_arr = np.split(oneD, 4, axis=0)
print(split_arr)
ValueError: array split does not result in an equal division
Whereas .array_split() can handle it without an error message.
# column arrays, 4 NOT evenly divisible into #rows=6
import numpy as np
oneD = np.array([[1], [2], [3], [4], [5], [6]])
split_arr = np.array_split(oneD, 4, axis=0)
print(split_arr)
# 2 sub- arrays with 2 elements and 2 sub-arrays with one element
# gives 6 elements total.
[array([[1],
[2]]), array([[3],
[4]]), array([[5]]), array([[6]])]
--------------------------------------
2 dimensional examples:
# rows = 2 evenly divisible by 2
import numpy as np
twoD = np.array([[1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66]])
split_arr = np.array_split(twoD, 2, axis = 1)
print(split_arr)
[array([[ 1, 2, 3],
[11, 22, 33]]), array([[ 4, 5, 6],
[44, 55, 66]])]
# rows = 3 NOT evenly divisible by 2
import numpy as np
twoD = np.array([[1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66], [101, 202, 303, 404, 505, 606]])
split_arr = np.array_split(twoD, 2, axis = 0)
print(split_arr)
[array([[ 1, 2, 3, 4, 5, 6],
[11, 22, 33, 44, 55, 66]]), array([[101, 202, 303, 404, 505, 606]])]
# columns = 3 NOT evenly divisible by 2
import numpy as np
twoD = np.array([[1, 2, 3], [11, 22, 33], [101, 202, 303]])
split_arr = np.array_split(twoD, 2, axis = 1)
print(split_arr)
[array([[ 1, 2],
[ 11, 22],
[101, 202]]), array([[ 3],
[ 33],
[303]])]
As you can see from the examples above, in the case of remainders, arrays that fully satisfy the divisor are created in preference from left to right. Otherwise, by using .split() you'll get an error message that says "array split does not result in an equal division".
If you're using the .array_split() it returns an array of arrays, but it will also work with what's called a "ragged" matrix or a ragged array where at least two rows or at least two columns have unequal lengths compared to each other. Use the optional argument of dtype to get it to work.
import numpy as np
# The 3rd row has 4 columns
ragged_array = np.array([[7, 8, 9], [10, 11, 12], [13, 14, 15, 14]], dtype=object)
fixed_arr = np.array_split(ragged_array, 3, axis=0)
print(fixed_arr)
[array([[7, 8, 9]], dtype=object), array([[10, 11, 12]], dtype=object), array([[13, 14, 15, 14]], dtype=object)]
When you're first creating the array, but before you split it, the optional keyword dtype =object will get the array stored as pointers. This allows you to do certain things, for example, to handle arrays that are unequal in both the column and row directions as shown above.
The optional axis keyword is allowed; the default is axis=0. axis=1 is only valid for arrays of 2-D and higher.
Analogous to the joining commands, python has an .hsplit() a .vsplit() in a .dsplit() member function. .hsplit() is sort of like .dsplit() except the latter gives you an extra dimension. In the case of .hsplit(), .vsplit() or .dsplit(), an array with ragged dimensions will be treated as if it had an extra dimension. Therefore you can coax .hsplit(), .vsplit() or .dsplit() into working on ragged arrays by adding an extra pair of parentheses around the entire array argument. Axis parameter is undefined for these three.
# Normal Arrays
.hsplit()
Think about #columns vs divisor
import numpy as np
hsplit_arr = np.array([[1, 2, 3], [4, 5, 6], [13, 14, 15], [16, 17, 18]])
newarray = np.hsplit(hsplit_arr, 3)
print(newarray)
[array([[ 1],
[ 4],
[13],
[16]]), array([[ 2],
[ 5],
[14],
[17]]), array([[ 3],
[ 6],
[15],
[18]])]
.vsplit()
Think about #rows vs. divisor
import numpy as np
hsplit_arr = np.array([[1, 2, 3], [4, 5, 6], [13, 14, 15], [16, 17, 18]])
newarray = np.vsplit(hsplit_arr, 2)
print(newarray)
[array([[1, 2, 3],
[4, 5, 6]]), array([[13, 14, 15],
[16, 17, 18]])]
.dsplit()
Think about #columns vs divisor
# notice the extra pair of [] around the entire argument
import numpy as np
arr = np.array([[[1, 2, 3], [44, 55, 66], [77, 88, 999], [10, 11, 120]]])
newarr = np.dsplit(arr, 3)
print(newarr)
[array([[[ 1],
[44],
[77],
[10]]]), array([[[ 2],
[55],
[88],
[11]]]), array([[[ 3],
[ 66],
[999],
[120]]])]
##################### RAGGED ARRAY #############
import numpy as np
dsplit_arr = np.array([[[1, 2, 3], [4, 5, 6], [1, 13, 14, 15], [16, 17, 18]]], dtype=object)
newarray = np.dsplit(dsplit_arr, 4)
print(newarray)
ValueError: dsplit only works on arrays of 3 or more dimensions
# Now add extra [] to coax
import numpy as np
# the 3d row has 4 columns vs. 3 columns for the other rows
dsplit_arr = np.array([[[[1, 2, 3], [4, 5, 6], [1, 13, 14, 15], [16, 17, 18]]]], dtype=object)
newarray = np.dsplit(dsplit_arr, 4)
print(newarray)
[array([[[[1, 2, 3]]]], dtype=object), array([[[[4, 5, 6]]]], dtype=object), array([[[[1, 13, 14, 15]]]], dtype=object), array([[[[16, 17, 18]]]], dtype=object)]
No comments:
Post a Comment