Friday, July 24, 2020

Array copy versus Array View


Array copy versus Array View


When you .copy() an array into another array, it will be its own distinct space in memory with a control of its copy of the data. However, if you run the .view() member function on an array, any change to the view changes the original and vice versa. Think of .view() as having a memory address reference to an established variable instead of a new variable.

If you're uncertain whether a variable is a copy, or a view of the original, you can use the .base member property of the array class to figure it out. It will return None if it's a copy, and the actual original full array if not.

Use the .shape member property of the array class to return a tuple of the length of each dimension.

A somewhat complex and feature-rich member method of the array object class is the .reshape() function. In order to reshape an array, you cannot have any voids or unused elements, which means the multiplicative sum of the dimensions in the original array must equal the multiplicative sum of the dimensions for the reshaped array. Even after you do a reshape, checking the .base member property will turn the original array, which means it must be using extra memory if it's able to know the original array and also the reshaped one.


For example:

 [[1, 2, 3], [7,8,9]]


is a 2 (rows) by 3 (columns) dimensional array. It is convertible into a 3 by 2 array or a 1 by 6 array or into a 6 by 1 array. That's because 2*3 = 3*2 = 1*6 = 6*1 = 6. It is NOT convertible into 3 by 3 array because 3*3= = 9, so there would be 3 void/empty positions in the 3 by 3 array; Python will generate an error message if you try this.

ValueError: cannot reshape array of size 6 into shape (3,3)


import numpy as np

hopscotch = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]) # 2 rows by 6 columns

print(hopscotch)

new_hopscotch = hopscotch.reshape(4, 3)

print(new_hopscotch)

# results in a second array with 4 rows and 3 columns

[[ 1  2  3  4  5  6]

 [ 7  8  9 10 11 12]]

[[ 1  2  3]

 [ 4  5  6]

 [ 7  8  9]

 [10 11 12]]


You can also reshape two arrays of the same dimensionality for example a 2-D array of 3 by 5, into a 2-D array with 5 by 3 elements preserves all elements without voids or unused elements. Both arrays are two dimensional; you just transposed them.

Python allows you to have one unknown dimension in a .reshape() function call. The unknown dimension is marked by -1 and it can be in any position of any dimension of the argument list of the .reshape() function call.


If you do reshape with a single parameter of -1, that will flatten any N-dimensional array into a 1D array.

import numpy as np

hopscotch = np.array([1, 2, 3, 4, 5, 6, 7, 8])

new_hopscotch = hopscotch.reshape(2, -1, 2)

.reshape() figures out on its own what the 2nd dimension length should be.

print(new_hopscotch)

print(new_hopscotch.base)

# results in:

[[[1 2]

  [3 4]]


 [[5 6]

  [7 8]]]

[1 2 3 4 5 6 7 8]


--------------------------------



import numpy as np


original = np.array([[1, 2, 3], [4, 5, 6]])


for x in original:

  for y in x:

    print(y)


print("------ The next version may be faster as it only uses one loop ------------")


newone = np.array([[1, 2, 3], [4, 5, 6]]).reshape(-1)


for x in newone:

    print(x)


There's a more powerful and flexible version of reshape with the -1 parameter, and it's the .nditer() member method of the array class. It can not only reshape and flatten, but it can take additional optional arguments, namely to alter the data types of the array elements.


import numpy as np

numbers = np.array([1, 2, 3])

for x in np.nditer(numbers, flags=['buffered'], op_dtypes=['S']):

  print(x)

# results in:
# b is for a data type of byte

b'1'
b'2'
b'3'

Finally, the .ndenumerate() method of the array class returns element values and the index associated with each element value. If the array is multi-dimensional, it will give you multi-dimensional indices. If you're going to create a for loop to access each element separately, use two dummy variables: one for the index and one for the element value. Remember that arrays start counting at 0, so .ndenumerate() will return indices starting at 0.


import numpy as np

square1 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) # 2D

for iy, y in np.ndenumerate(square1):

  print(iy, y)

print("------------------")

square2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]).reshape(-1) # 1D

for iz, z in np.ndenumerate(square2):

  print(iz, z)


# results in: - the numbers in parentheses are array indices


(0, 0) 1

(0, 1) 2

(0, 2) 3

(0, 3) 4

(1, 0) 5

(1, 1) 6

(1, 2) 7

(1, 3) 8

------------------

(0,) 1

(1,) 2

(2,) 3

(3,) 4

(4,) 5

(5,) 6

(6,) 7

(7,) 8



Both .nditer() and .ndenumerate() respect array slicing syntax.


















Monday, July 20, 2020

Array Indexing and Array Slicing

Array Indexing and Array Slicing
Shape and Reshape
Iterating



Multidimensional arrays read right to left on indexing. Negative indexing is allowed, however, when you do negative indexing you start at -1, not zero.


Remember that slicing does not include the rightmost index.


NumPy lets you do stepping through arrays. Make sure to use two colons if using the optional step parameter.

A shortcut for grabbing every other element from the entire array 

[::2]

More on this stepping at the end of this post.


2-D arrays

import numpy as np

baskets = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print('4th element on 2nd dimension: ', baskets[1, 3])

# result is:

4th element on 2nd dimension: 9


Some special things about two-dimensional arrays:

The first argument can actually be a slice across dimensions; then for each dimension you can further index elements in that dimension. This means it will be returned to you as a smaller but still 2-dimensional array.


N-dimensional array

If you print the results you'll see each dimension has its own [] and then all ­the lists are in a mother list of sorts.

Example 3-D array


[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]

with the 1st dimension given by what is inside the blue braces, the 2nd dimension by what is inside the green braces and the 3rd dimension by what is inside the red braces. There are 2 pairs of green braces inside the single blue braces, so the 1st dimension is 2. There are 2 pairs of red braces inside each pair of green braces, so the 2nd dimension is 2.  There are 3 elements inside each pair of red braces, so the 3rd dimension is 3. 

In general, the total number of elements returned is the size of the slice of each dimension multiplied by each other.

In the example above, that would be 2 * 2 * 3 = 12, count them.

Another Example

print(some_array[1:3, 3:6, 2:8])

will print 2 * 3 * 6 = 36 elements, remembering that the leftmost accessing index on each dimension is NOT included.



NumPy has additional data types on top of the Python standard data types and they are represented by single characters. Apparently, they are 32 bit by default; see below for the list of NumPy data types.


i integer M datetime
b boolean         O object
u unsigned integer S string
f float U unicode string
c complex float V void
m timedelta


The .astype() member ­­­function of a NumPy object takes an optional dtype (some single character in single quotes) to communicate that you want the defined variable to have a certain NumPy data type. Numpy can also distinguish between 32 and 64-bit.


For i u f S and U you can do that bit precision specification as well as specify the number of bytes for those NumPy data types. Or if you don't want to do it that way, there is the .astype() member function of a NumPy object to convert to a different data type; or set the datatype from the outset if it's a new variable being created (with dtype = ' '). There is also a .dtype object property which will reveal the data type without changing it.


The .astype() member function, in addition to recognizing NumPy data types also accepts Python data types. So for example you could say dtype = float or you could say dtype = 'f'.


Be careful if you're converting back and forth between python data types and the equivalent NumPy data types as you might introduce rounding errors - see below:

import numpy as np

arr = np.array([1.1, 2.1, 3.1], dtype=float)

print(arr)
print(arr.dtype)

newarr = arr.astype('f')

print(newarr)
print(newarr.dtype)

newarr = arr.astype(float)

print(newarr)
print(newarr.dtype)


# results in:

[ 1.1  2.1  3.1]
float64
[ 1.10000002  2.0999999   3.0999999 ]
float32
[ 1.1  2.1  3.1]
float64


Using the .astype() when going from 'f' to 'i' will always implicitly do a floor() operation and round down.

You can tell Python to walk through an array by skipping certain elements in a regular interval pattern, as known as stepping.


import numpy as np

bus_stops = np.array(["Main St", "Oak Street", "Pine Street", "Maple Lane", "Harbor side", "Elm Street", "Depot"])

print(bus_stops [1:5:2])        # start : stop : interval

# results in:

['Oak Street' 'Maple Lane' 'Elm Street']

# because index 1 is Main St and step is 2 so skip 1 element and take the next one which is Maple Lane, skip one and next is Elm Street, skip one, then the end of array is reached. So the formula is if step is n, skip n-1 elements.




Friday, July 17, 2020

NumPy

NumPy

NumPy must be installed with PIP.

Then just import like any other module.

Whenever you see a __something__, that is called an attribute. Modules usually have many attributes.

Usually numpy is aliased as np.

import numpy as np

then create an array with np.array([]) by putting at least a scalar into the square braces.

0-D array is a scalar.
1-D array is a vector [ ]
2-D array is an array of vectors, or a matrix. [[ ]]
3-D array is an array of matrices, or rectangular box if you will envision that. [[[ ]]]

Using the len() built-in function will only give you the length of the highest dimension.

If you use more pairs of square braces than there are quantities to populate them, then the length of the unpopulated dimension(s) is one.

The .ndim property of np contains the number of dimensions of the total array.

An unlimited number of dimensions is allowed.

There are two ways to create an n-dimensional array.



1. Use the ndmin argument to the array() member function.

import numpy as np


family = np.array(["Dad", "Mom", "Daughter", "Son"], ndmin=3)

print(family)

# result is:

[[['Dad', 'Mom', 'Daughter', 'Son']]]



2. Use as many pairs of square braces as there are dimensions.

import numpy as np

family = np.array([[["Dad", "Mom", "Daughter", "Son"]]]

print(family)

# result is:

[[['Dad', 'Mom', 'Daughter', 'Son']]]


Note that the 1st and 2nd dimensions are of length 1.



Thursday, July 16, 2020

File Handling

File handling


There is a built-in python function open(); it takes two arguments. The first argument gives the action to perform, either read, append, write or create; read is the default.  The second parameter is for the mode, either text or binary; text is the default. The open() built-in function returns a file object.

read: "r"
append: "a"
write: "w"
create: "x"

If the file does not already exist, append/write/create will make it. Otherwise, trying to read an non-existent file returns an error.


modes

text: "t"
binary: "b"

.read() and .readline() are frequently used on a file object.


Both the action and the mode parameters are in a pair of double quotes and if you're using it with a path name make sure use two forward slashes right after the drive designator.

Curiously each line in the file is considered as a single-loop item, therefore you don't need to use a .readline(). You can just use a for loop and the dummy variable represents the individual line of the file. Just print() it.

Assume that the file budget.txt exists and has three lines in it.

testfile = open("budget.txt""r")
print(testfile.readline())
print(testfile.readline())
print(testfile.readline())


testfile = open("budget.txt""r")
for i in testfile:
  print(i) # goes from 1 to 3

# The result in both cases is the same.


There is no built-in close() function in Python, instead the .close() function is a member function of the file object.

If you want to get rid of a file, import the os module and use its .remove() member function. Before any of these operations you may want to check that it exists with os.path.exist()

Finally, in order to try to remove a folder it must be empty; use  os.rmdir().


properties and functions of the os module:


['CLD_CONTINUED', 'CLD_DUMPED', 'CLD_EXITED', 'CLD_TRAPPED', 'EX_CANTCREAT', 'EX_CONFIG', 'EX_DATAERR', 'EX_IOERR', 'EX_NOHOST', 'EX_NOINPUT', 'EX_NOPERM', 'EX_NOUSER', 'EX_OK', 'EX_OSERR', 'EX_OSFILE', 'EX_PROTOCOL', 'EX_SOFTWARE', 'EX_TEMPFAIL', 'EX_UNAVAILABLE', 'EX_USAGE', 'F_LOCK', 'F_OK', 'F_TEST', 'F_TLOCK', 'F_ULOCK', 'MutableMapping', 'NGROUPS_MAX', 'O_ACCMODE', 'O_APPEND', 'O_ASYNC', 'O_CLOEXEC', 'O_CREAT', 'O_DIRECT', 'O_DIRECTORY', 'O_DSYNC', 'O_EXCL', 'O_LARGEFILE', 'O_NDELAY', 'O_NOATIME', 'O_NOCTTY', 'O_NOFOLLOW', 'O_NONBLOCK', 'O_PATH', 'O_RDONLY', 'O_RDWR', 'O_RSYNC', 'O_SYNC', 'O_TMPFILE', 'O_TRUNC', 'O_WRONLY', 'POSIX_FADV_DONTNEED', 'POSIX_FADV_NOREUSE', 'POSIX_FADV_NORMAL', 'POSIX_FADV_RANDOM', 'POSIX_FADV_SEQUENTIAL', 'POSIX_FADV_WILLNEED', 'PRIO_PGRP', 'PRIO_PROCESS', 'PRIO_USER', 'P_ALL', 'P_NOWAIT', 'P_NOWAITO', 'P_PGID', 'P_PID', 'P_WAIT', 'RTLD_DEEPBIND', 'RTLD_GLOBAL', 'RTLD_LAZY', 'RTLD_LOCAL', 'RTLD_NODELETE', 'RTLD_NOLOAD', 'RTLD_NOW', 'R_OK', 'SCHED_BATCH', 'SCHED_FIFO', 'SCHED_IDLE', 'SCHED_OTHER', 'SCHED_RESET_ON_FORK', 'SCHED_RR', 'SEEK_CUR', 'SEEK_DATA', 'SEEK_END', 'SEEK_HOLE', 'SEEK_SET', 'ST_APPEND', 'ST_MANDLOCK', 'ST_NOATIME', 'ST_NODEV', 'ST_NODIRATIME', 'ST_NOEXEC', 'ST_NOSUID', 'ST_RDONLY', 'ST_RELATIME', 'ST_SYNCHRONOUS', 'ST_WRITE', 'TMP_MAX', 'WCONTINUED', 'WCOREDUMP', 'WEXITED', 'WEXITSTATUS', 'WIFCONTINUED', 'WIFEXITED', 'WIFSIGNALED', 'WIFSTOPPED', 'WNOHANG', 'WNOWAIT', 'WSTOPPED', 'WSTOPSIG', 'WTERMSIG', 'WUNTRACED', 'W_OK', 'XATTR_CREATE', 'XATTR_REPLACE', 'XATTR_SIZE_MAX', 'X_OK', '_DummyDirEntry', '_Environ', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_dummy_scandir', '_execvpe', '_exists', '_exit', '_fwalk', '_get_exports_list', '_putenv', '_spawnvef', '_unsetenv', '_wrap_close', 'abort', 'access', 'altsep', 'chdir', 'chmod', 'chown', 'chroot', 'close', 'closerange', 'confstr', 'confstr_names', 'cpu_count', 'ctermid', 'curdir', 'defpath', 'device_encoding', 'devnull', 'dup', 'dup2', 'environ', 'environb', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fchdir', 'fchmod', 'fchown', 'fdatasync', 'fdopen', 'fork', 'forkpty', 'fpathconf', 'fsdecode', 'fsencode', 'fstat', 'fstatvfs', 'fsync', 'ftruncate', 'fwalk', 'get_blocking', 'get_exec_path', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getegid', 'getenv', 'getenvb', 'geteuid', 'getgid', 'getgrouplist', 'getgroups', 'getloadavg', 'getlogin', 'getpgid', 'getpgrp', 'getpid', 'getppid', 'getpriority', 'getresgid', 'getresuid', 'getsid', 'getuid', 'getxattr', 'initgroups', 'isatty', 'kill', 'killpg', 'lchown', 'linesep', 'link', 'listdir', 'listxattr', 'lockf', 'lseek', 'lstat', 'major', 'makedev', 'makedirs', 'minor', 'mkdir', 'mkfifo', 'mknod', 'name', 'nice', 'open', 'openpty', 'pardir', 'path', 'pathconf', 'pathconf_names', 'pathsep', 'pipe', 'pipe2', 'popen', 'posix_fadvise', 'posix_fallocate', 'pread', 'putenv', 'pwrite', 'read', 'readlink', 'readv', 'remove', 'removedirs', 'removexattr', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'sched_get_priority_max', 'sched_get_priority_min', 'sched_getaffinity', 'sched_getparam', 'sched_getscheduler', 'sched_param', 'sched_rr_get_interval', 'sched_setaffinity', 'sched_setparam', 'sched_setscheduler', 'sched_yield', 'sendfile', 'sep', 'set_blocking', 'set_inheritable', 'setegid', 'seteuid', 'setgid', 'setgroups', 'setpgid', 'setpgrp', 'setpriority', 'setregid', 'setresgid', 'setresuid', 'setreuid', 'setsid', 'setuid', 'setxattr', 'spawnl', 'spawnle', 'spawnlp', 'spawnlpe', 'spawnv', 'spawnve', 'spawnvp', 'spawnvpe', 'st', 'stat', 'stat_float_times', 'stat_result', 'statvfs', 'statvfs_result', 'strerror', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'symlink', 'sync', 'sys', 'sysconf', 'sysconf_names', 'system', 'tcgetpgrp', 'tcsetpgrp', 'terminal_size', 'times', 'times_result', 'truncate', 'ttyname', 'umask', 'uname', 'uname_result', 'unlink', 'unsetenv', 'urandom', 'utime', 'wait', 'wait3', 'wait4', 'waitid', 'waitid_result', 'waitpid', 'walk', 'write', 'writev']



Sunday, July 5, 2020

Integrated Development Environment

 Integrated Development Environment


Also known as IDEs, I'm looking at a variety of integrated development environments starting with software development using python.  Basically the IDEs that I've looked at can fall into one of two general categories.  They're either for beginners, novices and are basic stripped-down versions to get started quickly and/or for educational purposes.  Or they are advanced, feature-rich, somewhat complicated IDEs for a specific subject matter domain, for example engineering. 

I've decided that since this is a new language and I want to get started quickly with some basic software dev,  I will start off with an IDE from the first category - something basic and quick to learn.  More specifically I've decided to use Thonny as my IDE because it also has the added advantage that it simultaneously installs the newest version of python and then optionally handshakes with it via a shell or terminal that you can launch from within the Thonny IDE. Thonny is free. https://thonny.org/

 Thonny offers some options and packages and plugins that can installed and or created by you.

 It does handshake seamlessly with Plotter for making simple line graphs of your data, and also with Flask which is a web development platform.

You have some flexibility on how involved you want to run the debugger. In fact, Thonny is designed to be educational, so the emphasis on debugging flexibility over other features make sense.

 Thonny is a project on GitHub and it does have a Wiki.

 I do recommend installing the friendly-traceback  package, however the birdseye plugin (which is supposed to be an advanced version of debugging but really amounts to profiling of your code) is not working despite numerous attempts to get it to do so. If you really want to have birdseye running on Thonny, you'll either have to do some sort of manual patching of special files into the Thonny project directories or you'll have to work with Thonny in the developer's release version. Even then I'm not guaranteeing that the birdseye profiler will actually run, but it increases your chances based on some people having had success with manual patches and running in developer mode. Look at the Thonny GitHub project's Issues page for more details.