Friday, August 14, 2020

Random functions and distributions Part 1

 Random functions and distributions

You can create multidimensional arrays of random integers and floats with an optional size = (x,y,z) parameter, where x is the #rows and y is the #columns.

from numpy import random

The .choice() method of a numpy object can take as a parameter an array which can be a mix of datatypes. It also accepts an array of probabilities that must sum to 1.

from numpy import random

# works fine with floats, integers, and text

x = random.choice([13.7, 43, 7, 'test', 2, True], p=[0.1, 0.2, 0.5, 0.0, 0.1, 0.1], size=(300))

# notice that 'test' will never appear because it has prob = 0.0.


# result is:

['2' '13.7' '43' '7' '13.7' 'True' '7' '2' '2' '7' '7' '7' '7' '7' '7'

 '43' '7' '43' '7' '7' '13.7' '7' '2' '7' '7' '7' '2' '7' '7' '7' '7' '7'

 '2' '7' '7' '7' '43' '7' '2' '13.7' '7' '7' '7' '43' '13.7' '7' '2'

 'True' '43' '7' '43' 'True' '43' '7' '43' '7' '43' 'True' '43' '7' '43'

 '2' '7' '13.7' '2' '7' '7' 'True' '7' '7' '43' '7' '7' '7' '13.7' '43'

 '2' '7' '7' '7' '7' '43' '43' '2' '7' '7' '7' '43' '7' '13.7' '2' '7' '2'

 '7' '43' 'True' '7' '13.7' '7' 'True' '43' '7' 'True' '7' 'True' '2' '7'

 '7' '7' 'True' '43' '7' '7' '7' '7' '43' '7' '7' '2' '2' '2' '7' '7' '7'

 '7' '7' '43' '7' '7' '7' 'True' 'True' '7' '13.7' '2' '7' '7' '43' '43'

 '7' '43' '7' '43' '7' '7' '7' 'True' '7' '7' '7' '7' 'True' '43' '2' '2'

 '7' '13.7' '7' '2' '7' '43' '7' '7' '7' '13.7' '13.7' '7' '7' '7' '2' '7'

 '7' '13.7' '7' '13.7' '43' '43' '2' '2' '7' '7' '7' '7' '2' '7' '7' '43'

 '7' '7' '7' 'True' '7' '43' '7' 'True' '13.7' '43' '2' '7' '43' 'True'

 '7' '7' '43' '43' '13.7' '13.7' '43' '2' '13.7' 'True' '7' '43' '43' '7'

 '7' '43' 'True' 'True' '43' '13.7' '43' '7' '13.7' '7' '13.7' '7' 'True'

 'True' '13.7' '2' '7' '43' '13.7' '43' '43' 'True' '7' '43' 'True' '43'

 '2' '7' '7' '7' '7' '43' 'True' '7' '7' '7' '43' '7' 'True' '7' '7'

 '13.7' '2' 'True' '7' '7' '7' '2' 'True' '13.7' '43' '7' '7' '7' '7' '7'

 '43' '7' '2' '7' '7' '43' '2' '2' '7' '7' '7' '43' '13.7' '7' '7' '7' '7'

 '7' 'True' '43' '7' '2' '7' '7' '7' 'True' '43' '7' '7']

.shuffle() changes the original array.

.permutation() returns a new array.

The seaborn module, along with the matlibplot module, can be utilized for plotting and making histograms of both continuous and discrete probability distributions.

import matplotlib.pyplot as plt

import seaborn as sns

Somehow the .show() member function of a matplotlib.pyplot object already knows what needs to be plotted from having run one or more seaborn .distplot() first!!??

Poisson and Binomial distributions converge when the number of trials is very large and the p is near zero, meaning much less than even 0.1.

random.randint(maxint) returns an integer no larger than maxint at random.

.random() returns a float out to 16 decimal places.

Normal distribution

from numpy import random

loc is the mean. scale is the standard deviation

x = random.normal(loc=3, scale=0.5, size=(4, 13))

# remember that .random() returns floats


# result is:

[[2.80136473 3.54880004 2.89150748 2.14326926 3.03565892 3.41711576

  3.32692697 3.35579836 2.8661686  3.76114488 3.01858288 3.39992108


 [3.03767037 2.54722319 3.00753337 3.31534086 3.31006815 3.46443564

  3.36939275 3.24040843 2.65063788 3.59313176 2.84027165 2.21936811


 [2.35956149 3.70604286 2.91394761 2.00746178 3.45957505 3.49191263

  2.56781576 2.47446244 3.04135718 2.99737503 3.32111672 2.14935568


 [2.82624186 2.79458755 3.20454706 2.97537827 3.04814055 2.16203024

  2.87561735 3.03233987 2.32153526 3.14072811 3.62589382 3.16902181


Binomial distributions are discrete.

from numpy import random

x = random.binomial(n=100, p=0.5, size=(20,5))

Note that the product of the size dimensions must equal n.


# result and notice how values cluster around 50 which is n*p = 100*0.5 = 50

[[53 48 43 47 59]

 [46 44 52 52 59]

 [53 49 52 51 53]

 [46 51 56 62 52]

 [49 52 51 52 51]

 [42 48 52 53 49]

 [51 50 45 50 48]

 [47 49 51 54 47]

 [41 38 47 42 53]

 [53 50 45 57 42]

 [63 51 39 49 47]

 [53 43 42 49 48]

 [53 52 59 55 58]

 [54 48 51 57 51]

 [45 46 47 48 48]

 [51 42 44 53 48]

 [46 39 48 44 44]

 [52 53 46 45 46]

 [52 41 48 55 49]

 [51 42 47 46 57]]

when you see a kde parameter that is set to True in a .distplot() call, you will get a fitted curve drawn based on the histogram bars.

Normal and Binomial distributions converge when the number of Binomial trials times the probability is near the mean of the Normal distribution (and the scale factor goes as the square root of the number of trials times the previous scale factor).

Poisson distributions are discrete and deal with the number of times a specific event can recur.

from numpy import random

import matplotlib.pyplot as plt

import seaborn as sns

# lam is the targeted number of occurences.

# careful that hist and label and kde are parameters to distplot

# .poisson() is responsible for the size and lam values

x = random.poisson(lam=2, size=(9,5))

sns.distplot(x, hist=False, label='poisson')


[[2 3 6 7 2]

 [1 0 2 1 3]

 [2 3 0 0 1]

 [2 4 3 2 2]

 [2 2 4 3 2]

 [4 4 3 0 0]

 [2 2 4 1 1]

 [2 5 1 1 0]

 [3 1 1 1 2]]

# notice that the largest discrete value generated is 6, but the curve

# plots beyond that to higher values.


from numpy import random

import matplotlib.pyplot as plt

import seaborn as sns

# Notice that the result of the random module function serves as input for the seaborn function.

sns.distplot(random.normal(loc=20, scale=3, size=1000), hist=False, label='normal')

sns.distplot(random.poisson(lam=8, size=1000), hist=False, label='poisson')

No comments:

Post a Comment