Tuesday, August 25, 2020

ufunc

 Ufunc


ufunc operates only on ndarrays. The purpose of ufunc is to vectorize loops. 


Let's say we want to add two ndarrays of the same dimensions. We can use python's built-in .zip() function and a for loop:


import numpy as np


# ndarrays must be the same size

no1 = [7 8 9 10 1 12 13 14]

no2 = [17 28 9 120 11 122 1300 314]

no3 = []


# i iterates over no1 and j iterates over no2

for i,j in zip(no1, no2)

    z.append(i+j)


print(no3)


# result is

[24, 36, 18, 130, 12, 134, 1313, 328]


Or we can use ndarrays and its .add() method.


import numpy as np


# ndarrays must be the same size

no1 = np.ndarray([7, 8, 9, 10, 1, 12, 13, 14])

no2 = np.ndarray([17, 28, 9, 120, 11, 122, 1300, 314])

no3 = np.add(no1, no2)

print(no3)


# result is same as before


[  24   36   18  130   12  134 1313  328]


What happens if the two arrays are not the same size:


import numpy as np


# ndarrays must be the same size

no1 = np.ndarray([7, 8, 9, 10, 1, 12, 13]) # 7 elements

no2 = np.ndarray([17, 28, 9, 120, 11, 122, 1300, 314]) # 8 elements

no3 = np.add(no1, no2)

print(no3)


ValueError: operands could not be broadcast together with shapes (8,) (7,) 


What happens if the two arrays are the same size, but with different dimensions:

import numpy as np


# ndarrays must be the same size

no1 = np.ndarray([7], [8], [9], [10], [1], [12], [13], [14]]) # 8  elements in one column

no2 = np.ndarray([17, 28, 9, 120, 11, 122, 1300, 314]) # 8 elements in one row

no3 = np.add(no1, no2)

print(no1)


# results in:


[[  24   35   16  127   18  129 1307  321]

 [  25   36   17  128   19  130 1308  322]

 [  26   37   18  129   20  131 1309  323]

 [  27   38   19  130   21  132 1310  324]

 [  18   29   10  121   12  123 1301  315]

 [  29   40   21  132   23  134 1312  326]

 [  30   41   22  133   24  135 1313  327]

 [  31   42   23  134   25  136 1314  328]]


Which is 8 by 8 square and all of the normally summed elements make a trace along the diagonal. The off-diagonal elements go something like this: take no2, transpose it into a column vector, then shift it from left to right, hold while the element of no1 is added to each of no2, then shift the original no2 again one column to the right, and repeat.


There is also a .multiply() method


import numpy as np


x = [1, 2, 3, 4]

y = [4, 5, 6, 7]

z = np.multiply(x,y)


print(z)


[ 4 10 18 28]


Finally, .zip() can produced some unusual results when used in non-standard ways


import numpy as np

x = [1, 2, 3, 4]

y = [4, 5, 6, 7]

z = []


for j in zip(y, x):

  z.append(j + j)

print(z)


[(4, 1, 4, 1), (5, 2, 5, 2), (6, 3, 6, 3), (7, 4, 7, 4)]


z = []


for i in zip(y, x):

  z.append(i + i)

print(z)


[(4, 1, 4, 1), (5, 2, 5, 2), (6, 3, 6, 3), (7, 4, 7, 4)]


z = []


for i in zip(y, x):

  z.append(i)

print(z)


[(4, 1), (5, 2), (6, 3), (7, 4)]


In these non-standard cases, there is no addition done by zip, but rather appending and repeating of elements.





Monday, August 24, 2020

Random Functions and Distributions Part 2

 


Uniform Distributions


random.uniform() returns a float between 0 and 1 but if you use matplotlib and seaborn to graph the distribution, the curve has tails above 1 and below zero. The higher the sample size parameter, the more the distribution becomes flat top but it still has the tails.


The seaborn.distplot() has a label="some term here" parameter.


A logistic distribution has fatter tails than a normal distribution. 


Multinomial Distribution


The multinomial function has 3 parameters:


1. The 1st argument is the number of samples.

2. 2nd argument is a list of probabilities expressed numerically or as fractions, or both. Oddly, the probabilities do not have to sum to one and can be less than one. If the probabilities sum to more than one, python will generate an error. You are permitted to insert zeros.

3. The size=(#,#,#) argument where # is an integer and an array is generated. If there is only one #, i.e. a 1-D array, then the return variable is a row vector.


Examples 


from numpy import random


x = random.multinomial(n=44, pvals=[0, 0, 0, 0, 0, 1/6])


print(x)


# result is:


[ 0  0  0  0  0 44]


from numpy import random


# probabilities sum to one but there are two zeros

x = random.multinomial(n=44, pvals=[0, 2/6, 1/6, 0, 2/6, 1/6])


print(x)


# result is:


[ 0 15  9  0 14  6]


from numpy import random


# probabilities sum to one but there are two zeros

x = random.multinomial(n=1, pvals=[0, 2/6, 1/6, 0, 2/6, 1/6])


print(x)


# result is:


[0 1 0 0 0 0]


from numpy import random


# probabilities sum to one but there are two zeros

x = random.multinomial(n=12, pvals=[0, 2/6, 1/6, 0, 2/6, 1/6], size=(3,4))


print(x)


# result is:

# Note that the 3rd dimension is the number of distinct probabilities in pvals. 

# So the total number of elements is 3 * 4 *6 = 72.

# Note that columns where pval =0 have a zero as the element returned.


[[[0 3 0 0 5 4]

  [0 6 1 0 5 0]

  [0 4 4 0 3 1]

  [0 3 0 0 7 2]]


 [[0 5 2 0 0 5]

  [0 3 2 0 5 2]

  [0 4 3 0 3 2]

  [0 4 1 0 4 3]]


 [[0 4 1 0 5 2]

  [0 2 1 0 5 4]

  [0 4 3 0 1 4]

  [0 9 0 0 2 1]]]


Friday, August 14, 2020

Random functions and distributions Part 1

 Random functions and distributions


You can create multidimensional arrays of random integers and floats with an optional size = (x,y,z) parameter, where x is the #rows and y is the #columns.


from numpy import random


The .choice() method of a numpy object can take as a parameter an array which can be a mix of datatypes. It also accepts an array of probabilities that must sum to 1.


from numpy import random


# works fine with floats, integers, and text

x = random.choice([13.7, 43, 7, 'test', 2, True], p=[0.1, 0.2, 0.5, 0.0, 0.1, 0.1], size=(300))

# notice that 'test' will never appear because it has prob = 0.0.

print(x)

# result is:

['2' '13.7' '43' '7' '13.7' 'True' '7' '2' '2' '7' '7' '7' '7' '7' '7'

 '43' '7' '43' '7' '7' '13.7' '7' '2' '7' '7' '7' '2' '7' '7' '7' '7' '7'

 '2' '7' '7' '7' '43' '7' '2' '13.7' '7' '7' '7' '43' '13.7' '7' '2'

 'True' '43' '7' '43' 'True' '43' '7' '43' '7' '43' 'True' '43' '7' '43'

 '2' '7' '13.7' '2' '7' '7' 'True' '7' '7' '43' '7' '7' '7' '13.7' '43'

 '2' '7' '7' '7' '7' '43' '43' '2' '7' '7' '7' '43' '7' '13.7' '2' '7' '2'

 '7' '43' 'True' '7' '13.7' '7' 'True' '43' '7' 'True' '7' 'True' '2' '7'

 '7' '7' 'True' '43' '7' '7' '7' '7' '43' '7' '7' '2' '2' '2' '7' '7' '7'

 '7' '7' '43' '7' '7' '7' 'True' 'True' '7' '13.7' '2' '7' '7' '43' '43'

 '7' '43' '7' '43' '7' '7' '7' 'True' '7' '7' '7' '7' 'True' '43' '2' '2'

 '7' '13.7' '7' '2' '7' '43' '7' '7' '7' '13.7' '13.7' '7' '7' '7' '2' '7'

 '7' '13.7' '7' '13.7' '43' '43' '2' '2' '7' '7' '7' '7' '2' '7' '7' '43'

 '7' '7' '7' 'True' '7' '43' '7' 'True' '13.7' '43' '2' '7' '43' 'True'

 '7' '7' '43' '43' '13.7' '13.7' '43' '2' '13.7' 'True' '7' '43' '43' '7'

 '7' '43' 'True' 'True' '43' '13.7' '43' '7' '13.7' '7' '13.7' '7' 'True'

 'True' '13.7' '2' '7' '43' '13.7' '43' '43' 'True' '7' '43' 'True' '43'

 '2' '7' '7' '7' '7' '43' 'True' '7' '7' '7' '43' '7' 'True' '7' '7'

 '13.7' '2' 'True' '7' '7' '7' '2' 'True' '13.7' '43' '7' '7' '7' '7' '7'

 '43' '7' '2' '7' '7' '43' '2' '2' '7' '7' '7' '43' '13.7' '7' '7' '7' '7'

 '7' 'True' '43' '7' '2' '7' '7' '7' 'True' '43' '7' '7']




.shuffle() changes the original array.


.permutation() returns a new array.


The seaborn module, along with the matlibplot module, can be utilized for plotting and making histograms of both continuous and discrete probability distributions.


import matplotlib.pyplot as plt

import seaborn as sns


Somehow the .show() member function of a matplotlib.pyplot object already knows what needs to be plotted from having run one or more seaborn .distplot() first!!??


Poisson and Binomial distributions converge when the number of trials is very large and the p is near zero, meaning much less than even 0.1.


random.randint(maxint) returns an integer no larger than maxint at random.


.random() returns a float out to 16 decimal places.



Normal distribution


from numpy import random

loc is the mean. scale is the standard deviation

x = random.normal(loc=3, scale=0.5, size=(4, 13))

# remember that .random() returns floats

print(x)


# result is:


[[2.80136473 3.54880004 2.89150748 2.14326926 3.03565892 3.41711576

  3.32692697 3.35579836 2.8661686  3.76114488 3.01858288 3.39992108

  2.62656407]

 [3.03767037 2.54722319 3.00753337 3.31534086 3.31006815 3.46443564

  3.36939275 3.24040843 2.65063788 3.59313176 2.84027165 2.21936811

  2.07943729]

 [2.35956149 3.70604286 2.91394761 2.00746178 3.45957505 3.49191263

  2.56781576 2.47446244 3.04135718 2.99737503 3.32111672 2.14935568

  3.21543959]

 [2.82624186 2.79458755 3.20454706 2.97537827 3.04814055 2.16203024

  2.87561735 3.03233987 2.32153526 3.14072811 3.62589382 3.16902181

  3.50404782]]



Binomial distributions are discrete.



from numpy import random


x = random.binomial(n=100, p=0.5, size=(20,5))


Note that the product of the size dimensions must equal n.


print(x)

# result and notice how values cluster around 50 which is n*p = 100*0.5 = 50

[[53 48 43 47 59]

 [46 44 52 52 59]

 [53 49 52 51 53]

 [46 51 56 62 52]

 [49 52 51 52 51]

 [42 48 52 53 49]

 [51 50 45 50 48]

 [47 49 51 54 47]

 [41 38 47 42 53]

 [53 50 45 57 42]

 [63 51 39 49 47]

 [53 43 42 49 48]

 [53 52 59 55 58]

 [54 48 51 57 51]

 [45 46 47 48 48]

 [51 42 44 53 48]

 [46 39 48 44 44]

 [52 53 46 45 46]

 [52 41 48 55 49]

 [51 42 47 46 57]]


when you see a kde parameter that is set to True in a .distplot() call, you will get a fitted curve drawn based on the histogram bars.



Normal and Binomial distributions converge when the number of Binomial trials times the probability is near the mean of the Normal distribution (and the scale factor goes as the square root of the number of trials times the previous scale factor).



Poisson distributions are discrete and deal with the number of times a specific event can recur.


from numpy import random

import matplotlib.pyplot as plt

import seaborn as sns


# lam is the targeted number of occurences.


# careful that hist and label and kde are parameters to distplot

# .poisson() is responsible for the size and lam values

x = random.poisson(lam=2, size=(9,5))

sns.distplot(x, hist=False, label='poisson')


plt.show()

print(x)


[[2 3 6 7 2]

 [1 0 2 1 3]

 [2 3 0 0 1]

 [2 4 3 2 2]

 [2 2 4 3 2]

 [4 4 3 0 0]

 [2 2 4 1 1]

 [2 5 1 1 0]

 [3 1 1 1 2]]


# notice that the largest discrete value generated is 6, but the curve

# plots beyond that to higher values.




----------------------------------


from numpy import random

import matplotlib.pyplot as plt

import seaborn as sns


# Notice that the result of the random module function serves as input for the seaborn function.


sns.distplot(random.normal(loc=20, scale=3, size=1000), hist=False, label='normal')

sns.distplot(random.poisson(lam=8, size=1000), hist=False, label='poisson')


plt.show()