Python Code Lessons and Insights: December 2020

Wednesday, December 30, 2020

Tableau Gotchas and Workarounds

Tableau Gotchas and Workarounds

In my opinion, there are many aspects of creating worksheets, dashboards and to a lesser extent storybooks that are non-intuitive and sometimes downright awkward in Tableau Public. It should be noted that my comments are restricted to Tableau Public; it is possible that the higher level versions will behave better.

Here are some examples:

1. Plotting line graphs when the x-axis is a set of evenly spaced numbers. First you have to make that metric a dimension, not a measure. Second, you have to put it on the column shelf, even though you have values running horizontally along the x-axis. Third, you have to right mouse click on the dimension in the shelf, and change it from the default of discrete to continuous.

2. Bringing in a second Excel file into the Data Records area after worksheets have already been created. First you must edit the new data in some way in order to trigger Tableau to make an extracted temp file behind the scenes. Otherwise, Tableau will report errors when trying to re-save the workbook to the Public area.

3. Highlighting one or more lines in a storybook point. Highlighting any lines to stand out from a busy accumulation of lines sharing common axes must be the LAST thing done before clicking on Update. Otherwise, the highlighted selections will go away.

4. Changing font sizes of axis and title labels. The font size control is buried in the drop down menu for the font type control. There is no separate drop down menu just for the font size. You can only change it when the font type gets exposed and then you see the font size drop down subsumed within it.

5. Clipped axis labels. Carefully find the edge of the axis object, then click-drag to the right to get the label to appear from off left screen.

6. Storybook viewing size configuration. Do not trust any of the pre-configured vertical/horizontal pixel dimensions from the drop down menu. Especially dangerous is the Automatic option. Testing on Chrome, Firefox, Edge and Internet Explorere produced widely varying experiences, with none of them looking anywhere as near as good as when first drafted in Tableau Public. The best bet is to fiddle with various vertical/horizontal pixel dimensions in the Custom choice from the drop down menu. This will at least give consistent results across all browsers. More than likely it will also be significantly better then the Automatic setting, with at most viewers needing to scroll or go full screen to get the complete/best view.

Tuesday, December 29, 2020

Autocorrelation and Standard Errors

Autocorrelation and Standard Errors

I am analyzing the possible correlation between successive daily stamp listings with an Unspecified Grade.

Here is the equation for autocorrelation:

and the estimated standard errors are:

To that end, I have created a new analytic metric called the Stagger Forward Autocorrelation Matrix (SFAM)

The mathematics is not new here for the SFAM; it is the novelty arises from the way that I implement autocorrelation and the visual I choose to best represent the results.

A detailed presentation of SFAM can be found at my Tableau Public page:

https://public.tableau.com/profile/john.quagliano#!/vizhome/Staggered_Autocorrelation_Story/StaggerAutocorrelation?publish=yes

and the MATLAB/Octave code can be found here:

https://drive.google.com/file/d/1WjS-j-lkl1MW60KzrwHNHfSiiiDK0Ytt/view

Tuesday, December 1, 2020

Vandermonde Prediction Intervals

Vandermonde Prediction Intervals

This post is in conjunction with my Tableau Public Viz on the analysis of graded stamps for sale.

Calculate the Vandermonde prediction band using polyfit and polyval in Octave or MATLAB

[p, s] = polyfit([abscissa_values, ordinate_values, 1);
f = polyval(p,[abscissa_values]);

polyfit generates p and s:

s:

   'R'
          Triangular factor R from the QR decomposition.

     'X'
          The Vandermonde matrix used to compute the    polynomial coefficients.

     'C'
          The unscaled covariance matrix, formally equal to the inverse of X'*X, but computed in a way minimizing roundoff error propagation.

     'df'
          The degrees of freedom.

     'normr'
          The norm of the residuals.

     'yf'
          The values of the polynomial for each value of x.

p:

y-intercept and polynomial coefficients

Compute prediction intervals – high and low bands.

Note that the mathematics will yield slightly different high and low points that depend on each abscissa value in turn.

A = (x(:) * ones (1, n+1)) .^ (ones (k, 1) * (n:-1:0));

dy = sqrt (1 + sumsq (A/s.R, 2)) * s.normr / sqrt (s.df);

x(:) * ones (1, n+1)) -----> [z by (n+1)] matrix

This term above is the column vector of abscissa values (z by 1) times a row vector of ones (1 by n+1), where n is the order of the polynomial (linear n is 1).

.^ (ones (z, 1) * (n:-1:0)) ----> [z by (n+1)] matrix of 1s in leftmost column and 0s in the rightmost column.

This term exponentiates each element in the z by (n+1) abscissa data matrix – at each ith-jth position - by either a 1 or a zero. This still keeps the matrix as [z by (n+1)] but makes the entire rightmost column as 1s. The result is the Vandermonde matrix A of order 1.

In Octave and MATLAB right matrix division x/y is = (inverse (y') * x')'
where ' is the transpose.

If the system is not square, or if the coefficient matrix is singular, a minimum norm solution is computed.

sqrt (1 + sumsq (A/R, 2))

where:

R is a n+1 by n+1 matrix with the lower left element always zero. The 2 parameter means sum within each row.

A is z by (n+1)

2 by 2 * [(n+1) by z] gives a 2 by z matrix then transpose into z by 2.

sumsq() squares each element in a given row, and then sums those squares in that row. This will reduce any matrix into a column vector. z by 1, the same dimensions as the input data.

Note: In Octave you can add a constant to a matrix with just a plus, not .+

Finally, normr / sqrt(df) is the same scalar to be applied to all elements in the z by 1 vector:

dy = [z by 1] * normr / sqrt (df)

with normr the norm of the residuals of the least squares linear fit.

dy is then added/subtracted to/from the model line (yf) to create the high/low bands, point by point.

Python Code Lessons and Insights

Pages