Python Code Lessons and Insights: 2020

Wednesday, December 30, 2020

Tableau Gotchas and Workarounds

Tableau Gotchas and Workarounds

In my opinion, there are many aspects of creating worksheets, dashboards and to a lesser extent storybooks that are non-intuitive and sometimes downright awkward in Tableau Public. It should be noted that my comments are restricted to Tableau Public; it is possible that the higher level versions will behave better.

Here are some examples:

1. Plotting line graphs when the x-axis is a set of evenly spaced numbers. First you have to make that metric a dimension, not a measure. Second, you have to put it on the column shelf, even though you have values running horizontally along the x-axis. Third, you have to right mouse click on the dimension in the shelf, and change it from the default of discrete to continuous.

2. Bringing in a second Excel file into the Data Records area after worksheets have already been created. First you must edit the new data in some way in order to trigger Tableau to make an extracted temp file behind the scenes. Otherwise, Tableau will report errors when trying to re-save the workbook to the Public area.

3. Highlighting one or more lines in a storybook point. Highlighting any lines to stand out from a busy accumulation of lines sharing common axes must be the LAST thing done before clicking on Update. Otherwise, the highlighted selections will go away.

4. Changing font sizes of axis and title labels. The font size control is buried in the drop down menu for the font type control. There is no separate drop down menu just for the font size. You can only change it when the font type gets exposed and then you see the font size drop down subsumed within it.

5. Clipped axis labels. Carefully find the edge of the axis object, then click-drag to the right to get the label to appear from off left screen.

6. Storybook viewing size configuration. Do not trust any of the pre-configured vertical/horizontal pixel dimensions from the drop down menu. Especially dangerous is the Automatic option. Testing on Chrome, Firefox, Edge and Internet Explorere produced widely varying experiences, with none of them looking anywhere as near as good as when first drafted in Tableau Public. The best bet is to fiddle with various vertical/horizontal pixel dimensions in the Custom choice from the drop down menu. This will at least give consistent results across all browsers. More than likely it will also be significantly better then the Automatic setting, with at most viewers needing to scroll or go full screen to get the complete/best view.

Tuesday, December 29, 2020

Autocorrelation and Standard Errors

Autocorrelation and Standard Errors

I am analyzing the possible correlation between successive daily stamp listings with an Unspecified Grade.

Here is the equation for autocorrelation:

and the estimated standard errors are:

To that end, I have created a new analytic metric called the Stagger Forward Autocorrelation Matrix (SFAM)

The mathematics is not new here for the SFAM; it is the novelty arises from the way that I implement autocorrelation and the visual I choose to best represent the results.

A detailed presentation of SFAM can be found at my Tableau Public page:

https://public.tableau.com/profile/john.quagliano#!/vizhome/Staggered_Autocorrelation_Story/StaggerAutocorrelation?publish=yes

and the MATLAB/Octave code can be found here:

https://drive.google.com/file/d/1WjS-j-lkl1MW60KzrwHNHfSiiiDK0Ytt/view

Tuesday, December 1, 2020

Vandermonde Prediction Intervals

Vandermonde Prediction Intervals

This post is in conjunction with my Tableau Public Viz on the analysis of graded stamps for sale.

Calculate the Vandermonde prediction band using polyfit and polyval in Octave or MATLAB

[p, s] = polyfit([abscissa_values, ordinate_values, 1);
f = polyval(p,[abscissa_values]);

polyfit generates p and s:

s:

   'R'
          Triangular factor R from the QR decomposition.

     'X'
          The Vandermonde matrix used to compute the    polynomial coefficients.

     'C'
          The unscaled covariance matrix, formally equal to the inverse of X'*X, but computed in a way minimizing roundoff error propagation.

     'df'
          The degrees of freedom.

     'normr'
          The norm of the residuals.

     'yf'
          The values of the polynomial for each value of x.

p:

y-intercept and polynomial coefficients

Compute prediction intervals – high and low bands.

Note that the mathematics will yield slightly different high and low points that depend on each abscissa value in turn.

A = (x(:) * ones (1, n+1)) .^ (ones (k, 1) * (n:-1:0));

dy = sqrt (1 + sumsq (A/s.R, 2)) * s.normr / sqrt (s.df);

x(:) * ones (1, n+1)) -----> [z by (n+1)] matrix

This term above is the column vector of abscissa values (z by 1) times a row vector of ones (1 by n+1), where n is the order of the polynomial (linear n is 1).

.^ (ones (z, 1) * (n:-1:0)) ----> [z by (n+1)] matrix of 1s in leftmost column and 0s in the rightmost column.

This term exponentiates each element in the z by (n+1) abscissa data matrix – at each ith-jth position - by either a 1 or a zero. This still keeps the matrix as [z by (n+1)] but makes the entire rightmost column as 1s. The result is the Vandermonde matrix A of order 1.

In Octave and MATLAB right matrix division x/y is = (inverse (y') * x')'
where ' is the transpose.

If the system is not square, or if the coefficient matrix is singular, a minimum norm solution is computed.

sqrt (1 + sumsq (A/R, 2))

where:

R is a n+1 by n+1 matrix with the lower left element always zero. The 2 parameter means sum within each row.

A is z by (n+1)

2 by 2 * [(n+1) by z] gives a 2 by z matrix then transpose into z by 2.

sumsq() squares each element in a given row, and then sums those squares in that row. This will reduce any matrix into a column vector. z by 1, the same dimensions as the input data.

Note: In Octave you can add a constant to a matrix with just a plus, not .+

Finally, normr / sqrt(df) is the same scalar to be applied to all elements in the z by 1 vector:

dy = [z by 1] * normr / sqrt (df)

with normr the norm of the residuals of the least squares linear fit.

dy is then added/subtracted to/from the model line (yf) to create the high/low bands, point by point.

Monday, November 16, 2020

Tableau setting up

Tableau setting up

Just a few particulars as I get started using Tableau Public:

1. The data for the x-axis should be put in the Column shelf, while the data for the y-axis should go in the Rows shelf. Contrary to the tutorial videos, I prefer not to just drag and drop icons on the main canvas and trust it to do the right thing.

2. Cntl or Shift select multiple measures to go in as Rows that share a common Column by dragging onto the y-axis tic mark area until you see a double short vertical bars. The legend is auto-generated and can be copied with a right mouse click context popup menu.

3. After saving/uploading the visualization to the Tableau server, look for the "Metadata" link in the lower right corner of the webpage to see all the worksheets and dashboards listed by title. Click on the one you want. By default, Tableau always opens the Viz on the server to the last item.

4. When working in a Dashboard, look in the upper left hand corner for the "Range" and the drop down menu to the right of it. Choose "automatic" to fill your screen, which will hopefully also scale the viz to any other screen when people look at it on their own.

5. Again in the Dashboard, click into the blank white space subsection representing your row measure, then click on the "Color" mark button to get a simple menu of solid color choices for all of your bars in a bar chart. Otherwise, if you drag the row measure onto the button, the only color choices you'll see are all gradients.

6. To modify y-xis viewing parameters, right mouse click on the axis and then choose "Format ...". Changing the label size and font is done simultaneously with the tic mark size and font. Look for "Default" Font with a drop down menu.

Monday, November 2, 2020

SciPy Highlights

The scipy module is based on the numpy array handlers and does a variety of statistical, parameter contour space searches for minima, special treatments of sparse matrices, interpolations and hypothesis testing.

Finding roots of an equation:

from scipy.optimize import root

from math import cos

from scipy import constants

# Note that the equation can remain generalized with no definite array for x

def eqn(x):

return x**3 + cos(x)

myroot = root(eqn, (constants.pi/2)) # pi is a best initial guess at the root

print(myroot.x)

# results in:

# the solved for root is x. nfev is the number of evaluations of the function.

fjac: array([[-1.]])

fun: array([1.11022302e-16])

message: 'The solution converged.'

nfev: 16

qtf: array([-7.64299735e-12])

r: array([-3.00853835])

status: 1

success: True

x: array([-0.86547403])

see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.OptimizeResult.html#scipy.optimize.OptimizeResult for description of the Optimize Result object

We can also evaluate for the minimum of a function:

minimize(a, b, c, d, e)

a: the objective function

b: best initial guess

c: choice of evaluation method

d: callback - optional post-iteration function to use

e: dictionary of optional parameters

def eqn(x):

return x**2 + x - 6

themin = minimize(eqn, 10, method='BFGS')

print(themin)

# results in:

fun: -6.249999999999978

hess_inv: array([[0.49999999]])

jac: array([2.98023224e-07])

message: 'Optimization terminated successfully.'

nfev: 8

nit: 2

njev: 4

status: 0

success: True

x: array([-0.49999985])

If we try a cubic function though, the results are not usable.

def eqn(x):

return x**3 + x - 6

themin = minimize(eqn, 750, method='BFGS')

print(themin)

# results in:

fun: array([-6.94927495e+08])

hess_inv: array([[-0.00037628]])

jac: array([2353680.])

message: 'Desired error not necessarily achieved due to precision loss.'

nfev: 276

nit: 18

njev: 132

status: 2

success: False

x: array([-885.75370841])

GitHub Repositories

Python source file, MySQL Workbench model, EER Diagram and sample flat files and database dump file for the eBay collectible Stamp Grades Aspect Container ensemble statistics and Stamp World Region Category Container statistics.

https://github.com/DancingGuy/Stamp-Geographic-Statistics

https://github.com/DancingGuy/Stamp-Grade-Statistics

Tuesday, October 20, 2020

More SQL Commands

More SQL Commands

SELECT DISTINCT column_name_1, column_name_2

is useful for grabbing all the unique fields elements and is often used with a

WHERE BETWEEN/LIKE/IN with operators >=, <=, <> and connectors OR/AND/NOT.

BETWEEN: inclusive of the start and endpoints

LIKE: pattern matching such as '%s' to find fields that end in the letter s.

IN: find members of a set. Think of it as multiple ORs.

SELECT * FROM table_name ORDER BY column1, column2

first orders ASC by default on column1, then any ties are broken by sorting on column2. Note: you need to put the ASC or DESC after each column name.

INSERT INTO on less than all columns of a table will put a NULL into any column not specified with the command. So if you have a table with 10 columns, but only insert data into 4 of the columns, the 6 unassigned columns would automatically be filled with NULL (assuming the column is declared to be nullable in the design).

Test for NULL with IS NULL or IS NOT NULL.

UPDATE table_name SET column1=value1, column2=value2 WHERE conditions

This is a good way to change the timestamps that cross over the midnight boundary because your timezone is different than UTC.

SELECT MIN/MAX() AS variable_name FROM table_name

In MySQL Workbench, functions like MIN/MAX() will appear in gray.

produces a new variable.

COUNT/AVG/SUM() does not include any NULLs.

LIKE can take wildcards:

% -> 0 more more characters in that position of the pattern

_ -> a single placeholder for a character

% ... % -> the ... in any position

Those wildcards also work against an INT.

ESCAPE: allows you to use special characters as literals in the pattern. For example, LIKE '543#%' means match 543% exactly, not wildcarded.

A powerful function in MySQL is REGEXP_LIKE() because it permits traditional regular expression syntax and pattern matching.

NOT IN is useful when followed by (SELECT ...)

(NOT) BETWEEN accepts numbers, text or dates and is often used with IN. If you are using it for dates, use a pair of # to bracket the date.

ALIAS only exists for the duration of the query.

UNION combines two or more SELECTs.

1. Each SELECT needs the same number of columns

2. All columns must be of the similar type to what it is lined up with.

UNION ALL will preserve multiplicity.

Resultant column names will inherit from the names of the columns from the first SELECT clause.

GROUP BY breaks down the various elements in a column into N subsets, gathered into any nth subset all together, for N distinct field values available.

HAVING is a pseudo-where to go with aggregation keywords such as GROUP BY, ORDER BY

EXISTS returns a boolean if any row survives the joins and conditions.

ANY and ALL are restricted to use with WHERE or HAVING only. ANY or ALL must be preceded by a compariosn operator such as > or <>.

SELECT INTO copies data into a new table. Supplementing with AS will provide new column names.

SELECT * INTO backupTable FROM old_table creates a copy in the same schema.

SELECT * INTO backupTable IN 'BackupSchema.mdb' FROM old_table creates a copy in a different schema.

TRICK: Create a new empty table with the same column design:

add WHERE 1=0 to force no matching.

INSERT INTO table_name(col names) SELECT another_table's columns FROM another_table

CASE

WHEN cond1 THEN res1

..........

ELSE default_res

END AS new column

Important: If all WHENs fail and there is no ELSE, then a NULL is returned. Otherwise it quits at the first true condition.

If there is no AS clause after the END keyword, the column is named in very verbose ugly way with all of the WHENs' result string concatened together.??!!

IFNULL() or COALESCE() allow you to substitute any non-null values for NULL in math calculations.

Stored Procedures:

CREATE PROCEDURE proc_name

sql-commands

Run it with:

EXEC proc_name

Procedures can receive parameters too.

use @param_name data_type

EXEC proc_name @param_name='...'

and add more parameters with commas

Comments are either -- or /* */

See the full list of SQL operator symbols at w3schools.com/sql/sql_operators.asp

Monday, October 19, 2020

eBay Finding API issues

eBay Finding API issues

1. eBay on-line docs say that the string length for the primary and secondary category names are max at 30 characters but I found several names longer than 30.

2. A watchCount field is populated in the response object under the listingInfo grouping. This is not documented by eBay.

3. eBay has substituted the erstwhile CurrentPriceLowest Sort By Order argument with PricePlusShippingLowest.

4. Sometimes the API call hangs even after 3 retries. Change the timeout setting to be longer than the 20 second default in the Connection argument list.

5. A primary category number may have more than one primary category name. Fore example, primary category number 7921 is both "Collections, Lots" and "Collections/Mixtures".

Thursday, October 1, 2020

Database enhancements and updates

Database Enhancements and Updates

1. To the flat Excel CSV files I'm now saving the command line options on the metadata header line.

2. I also now added the primary and secondary category name descriptors to both the flat files and the database files, as well as saving the sort by order enum values database files and the metadata.

3. In Task Scheduler I setup runs spaced 20 minutes apart to do current price highest, watch count decrease sort or price plus shipping lowest; all three sort by order choices will be in the same flat file and in the same database and metadata table.

WARNING: If you have database writing errors to the MySQL server this will often times hang the table with a lock and also leave a hanging external connection to DB server from your python script. To remedy this go to the MySQL workbench, choose "Server" from the main menu and then choose "Client connections". Look for a message that says something like "waiting for metadata lock process" and then using a right mouse click with the context pop-up menu either do a "kill query" or "kill connection". Do the same for any Windows users that show a connection; do a kill connection. If you don't do this, any select/delete/truncate SQL commands will hang or timeout at the Query tab in my MySQL Workbench. Trying to issue commit and rollback commands won't help either.

WARNING: Note that using an unsigned property for a float or decimal column datatype in MySQL is deprecated as of version 8.0. When you are creating a decimal database field type with a plus or minus sign allow an extra digit otherwise you should expect a Data Type error from Python saying "Out of Range" if you did not allocate enough space.

decimal(x,y) where x is ALL of the digits along with any expected sign (+ or -) and y is only the digits to the RIGHT of the decimal point.

example: -123.45 would be decimal(6,2)

WARNING: If you get an error in the python while using the MySQL connector module functions that goes something like a "your column name" not in "your field name", that likely means you are trying to write values to the wrong table.

4. I ended up changing bestOfferEnabled values as false or true with 0 or 1 in the database.

5. It turns out the JSON object does return a watch count for all sorting types; it's under the item listingInfo dictionary field. I save watch counts on all my queries however if the watch count is such that no one is watching, eBay's SDK API will not give you a zero in the dictionary for listingInfo; instead the watch count field will be completely missing. So you're going to have to accommodations for that by making sure that you define the watchCount column in MySQL to be nullable and have the python script pass in None.

CAUTION: when using ALTER to change enum values, make sure to kill all connections and queries first, even for the host. Then after you do your ALTER command, truncate everything in the modified table or otherwise your python script may raise an exception for a hung table or database.

Here is the newest EER showing the latest database design

Monday, September 14, 2020

More About MySQL Workbench

More About

MySQL Workbench

I have revisited my python script to add in a lot of error handling functions and consistency checks versus standards. I decided to leave the 16-member list describing the 16 World Regions from the SQL code and instead put it into the python code. This will keep the database to a minimum number of columns and SQL statements. Also, I have - when talking to the DB server - more flexibility using the Python language eBaySDK. In case eBay changes for example "British Colonies and Territories" into two separate columns of "British Colonies" and "British Territories", I can catch that and raise an exception BEFORE piling values into the database.

I also dropped down to one primary key because I really don't need "year month day hour minute second" as a primary key; a bigint primary key starting at 1 incrementing by 1 is fine. If I really need to do an exact date match I can do a select-where-like date in the date column without it being a primary key.

SQL workbench usability tips

There is a pair of downward-pointing arrow heads next to the schema name in the model tabsheet. Click next to those double heads to expand and allow editing comments to the database.

If you want to change the order of columns from left to right highlight the column to be moved, then right mouse click to use the move up or move down option from the pop-up context menu.

I found that trying to mess with Edit → Preferences … lead to the Workbench crashing. I don't have an answer for this; there are other places besides preferences where I can set up my configurations.

If you want to see all of your databases in MySQL workbench, launch the MyInstance80 or whatever you've named your instance of the MySQL Server and you'll see all your databases alphabetically on the left margin.

Forward Engineering is a safe tool to export your database model design onto the live DB server as a working schema. Some options that will be useful has you forward engineer are the following:

Generate a drop before creating the table --- this will give you a clean slate.

Skip anything related to foreign keys for now I don't need them.

Check the box to show warnings.

Check the box that you do not want to create users when you forward engineer the schema.

Check drop schema when making modifications.

Another tip I have for you is to save frequently to different file names while working on all models because as I mentioned before the system is a little bit unstable.

When I talked about error trapping in the python code, here are some examples of what I mean:

First of all, very early in the code, ensure that you can make a connection to the live DB server and that the tables exist.

As I stated earlier, before populating columns with data, check the number and names of the world regions in master list.

Warn if eBaySDK has changed versions but continue anyhow.

Check the category ID and subcategory ID match versus a standard list.

Make sure your JSON is valid.

Some potential gotchas when using either my SQL or python:

You must use a pair of single quotes to wrap around a JSON as a python string that you feed to the .loads() JSON module function, as it automatically checks to make sure it's valid and will raise an exception if not.

Unfortunately double quotes and even the string constructor method will not wrap a JSON properly for the .loads() JSON module function.

I should get in the habit of visually scanning for these common errors before submitting the script to The Interpreter:

Look for colons at the end of def and if statements.

Check your indentation on commands that are heavily nested inside a conditional.

Dictionary variables need a .get() method to get values from the dictionary if that dictionary is inside a list. [][] won't work.

global variable has to be on a line by itself; use a second line to actually assign a value to the variable.

The element types inside a set must match the way they will be processed later - meaning if you're going to be processing them as ints, set them up as ints now.

More Gotchas:

To avoid SQL injection when using the connector module function requiring a SQL command, use a %s. It does not matter if it is a float, a bool, an int or any other SQL supported datatype. The Connector module functions behind the scenes are smart enough to automatically convert into the appropriate format expected on handshake to the my SQL database server … pretty impressive I think ... but not obvious or well-documented.

Again some quirks in MySQLworkbench:

If you have to edit the fundamental nature of your database, it's best to go ahead and drop the entire database in the instance tabsheet, close the instance, save your work, close Workbench and reopen. Do NOT forward engineer on top of an existing schema, even if you set the option to drop a schema on modification; you will timeout. Another clue that you've got a problem with a database is that when in the instance tabsheet view, the tables will keep fetching, then timeout.

Don't use two primary keys because from python's point-of-view, it won't like the second one, especially if it's a date string; you'll get error messages that don't have anything to do with the problem.

Make sure you are auto-incrementing in the workbench model design with some type of int and the left-most column will populate on its own. The default values to start at number one.

TRUNCATE TABLE table_name will remove all rows faster than DELETE * and restart the counter at 1.

Use the query tab in the instance tab of workbench to ad hoc run SQL commands against the database. You don't need trailing semicolons for this if you are issuing only one command. First double check that the schema that you want to use on the left hand side of Navigator panel is engaged by double-clicking to see it as bold; that way you do not have to specify the schema in the SQL query.

The New Script tool would you see in the middle of a model tab creation page is only for attaching to a model before forward engineered; it will not be running at the workbench GUI.

Make sure the auto-commit button is depressed so that statements in the Query box take effect.

To change the column name in a table in an already established schema, use:

ALTER TABLE table_name CHANGE old_column new_column datatype

NOTE: the datatype is required, even if it the same as for the old_column.

Thursday, September 10, 2020

MySQL Workbench

MySQL Workbench

I've taken a useful detour to install and configure MySQL so that I can model and forward engineer a model to a live DB server. That task is now complete for two data tables and two metadata tables for storing the results of parsing eBay data using the ebaysdk and Finding API for python. In particular, I am mining right now for world region and stamp quality data to understand trends over time in the eBay sales of stamps and to improve the performance of my eBay Stamps store.

This MySQL database is where we'll store the results once per day of the number of stamp Lots offered for sale broken down by regions and also separately in a different table broken down by grades.

The Regions table and the Grades table will be populated with big integers, both Alpha and incremental numeric timestamps and a descriptor that tells you about how all of the columns relate to each other.

I will be looking for chronological trends in each of the columns in both the world regions and grades and this will be compared for databases - yet to be modeled - for statistics on the items and the sellers of the items themselves.

Advantages of model building with MySQL workbench:

1. Extensive comments and notes can be attached to the database, tables ... even to individual columns.

2. Column defaults and keys are readily set.

3. An Enhanced Entity Relationship EER diagram can be created in one click from model tables and it's GUI editable right within the diagram.

4. Only available in Commercial version: schema validation plugins - dozens of them - in one report to check your design.

5. Warnings if you configure columns with invalid or missing parameters.

6. Once a model is complete you can forward engineer to a live connection to create a live database.

7. There is a tool to autogenerate SQL code mapped to your model.

8. Server connections are easy to set up, test and are persistent across Workbench program launches.

Here is my EER Diagram for the World Regions Database:

Monday, August 31, 2020

SQL JOIN

SQL JOIN

If two tables already exist, and the 2 tales have a dependency column between them, this permits you to do a JOIN operation.

The format looks like this:

SELECT column1 column2 FROM dependent table INNER JOIN independent table ON interconnected dependent table's column3 = independent table's column4

Columns 1 2 3 and 4 can all be different.

Column1 --> Column3 --> Column4 --> Column2

will return Column1, Column2 pairs

There is an optional AS keyword variable for aliasing a database x.y variable.

A LEFT JOIN shows all named rows of Column1 even if no match. A None is returned for Column2 in the cases of no matches; this may omit some Column2 values from being displayed.

A RIGHT JOIN guarantees to return all unique Column2 values, even if no matches to Column1 exist. This may omit some Column1 values and will return None for Column1 if there is no match.

Example

players table

columns: ID name sport

Sport table

columns: ID name

players table - the dependent table

1 John 3

2 Sam 2

3 Mary

Mary has no sport

Sport table - the INdependent table

1 baseball

2 golf

3 basketball

SELECT players.name sport.name FROM players INNER JOIN sport ON players.sport = sport.id

will return

('John', 'basketball')

('Sam', 'golf')

If you don't want to use players.name, you can shorten it as ply

players.name AS ply

LEFT JOIN

SELECT players.name sport.name FROM players LEFT JOIN sport ON players.sport = sport.id

will return all players names no matter what

('John', 'basketball')

('Sam', 'golf')

('Mary', 'None)

Note that baseball is omitted.

RIGHT JOIN

SELECT players.name sport.name FROM players RIGHT JOIN sport ON players.sport = sport.id

will return all sport names no matter what

('John', 'basketball')

('Sam', 'golf')

(None, 'baseball')

Note that Mary is omitted.

Another thing to remember about LEFT and RIGHT JOIN is that multiplicity of rows may occur, one-for-one with multiple matches on the crossover column.

SELECT A.a, B.b

FROM A

RIGHT JOIN B ON A.c = B.c

results in all rows of B are guaranteed to be returned. If any row of B does not contain a match with A on the mutually shared-in-common column c, then column c will be NULL in the A.a field in the result for that row in B. If any row of B does contain 1 or more matches with A on the mutually shared-in-common column c, then column c will be populated uniquely one-for-one in the A.a field with the one or more rows of B that match. This means that the resultant number of rows will contain the number of rows equal to the total number of matches against A.c on B.c

-----------------------------------

SELECT A.a, B.b

FROM A

LEFT JOIN B ON A.c = B.c

results in all rows of A are guaranteed to be returned. If any row of A does not contain a match with B on the mutually shared-in-common column c, then column c will be NULL in the B.b field in the result for that row in A. If any row of A does contain 1 or more matches with B on the mutually shared-in-common column c, then column c will be populated uniquely one-for-one in the B.b field with the one or more rows of A that match. This means that the resultant number of rows will contain the number of rows equal to the total number of matches against B.c on A.c

Friday, August 28, 2020

SQL: ORDER BY - DELETE FROM - DROP TABLE - UPDATE - LIMIT/OFFSET

SQL: ORDER BY - DELETE FROM -

DROP TABLE - UPDATE - LIMIT/OFFSET

ORDER BY is ascending by default but can be descending with the DESC additional keyword.

"SELECT * FROM players ORDER BY name"

"SELECT * FROM players ORDER BY name DESC"

name is a column in the values part of the table players.

-----------------------

DELETE FROM table_name WHERE condition

deletes records only.

forgetting to use WHERE will delete all records

A .commit() is required.

Use %s in the WHERE qualifier to prevent SQL injections.

sql = "DELETE FROM players WHERE name = %s"

nam = ('Smith',)

mycursor.execute(sql, nam)

mydb.commit()

--------------------------

sql = "DROP TABLE IF EXISTS table_name"

removes table and all of its records.

This will remove all records and no .commit() needed. The optional IF EXISTS will prevent an error message.

------------------------------

UPDATE is a more complex SQL statement.

The format is:

sql_comm = "UPDATE table_name SET column_name='user_input' WHERE column_name='user_value'"

If you forget the WHERE conditional, then all records will be updated to user_input.

A .commit() call is required and it is best to use %s holder to prevent SQL injection.

Note that when 2 or more placeholders are used, then a trailing comma is not needed in the tuple.

sql_command = "UPDATE players SET name = %s WHERE name = %s"

person = ('Jonathan', 'John')

mycursor.execute(sql_command, person)

mydatab.commit()

print(mycursor.rowcount, "record(s) affected")

.rowcount() gives the number of matches.

-----------------------

Use LIMIT at the end of the SELECT statement to restrict the first # matches.

sql_command = "SELECT * FROM players WHERE name = %s LIMIT 3"

nam = ('Steve',)

mycursor.execute(sql_command, nam)

just returns the first 3 names of Steve, even if there are 4 or more matches.

Since results are returned in ascending key order, the optional OFFSET keyword will stagger down the list and thus skip the OFFSET # of first (early) matches.

sql_command = "SELECT * FROM players WHERE name = %s LIMIT 3 OFFSET 3"

makes the first returned match be the 4th actual match overall in the table.

Thursday, August 27, 2020

SQL: INSERT - SELECT - WHERE LIKE

SQL: INSERT - SELECT - WHERE LIKE

After inserting a row make sure to issue a .commit() command to get the row officially in the database. The .commit() is done on the variable returned from the connector call, not the cursor object.

Use .executemany() to concurrently insert 2 or more rows. The rows must be set up as tuples and the tuples are then in a list. It is OK to mix single and double quotes to mark the values in each row.

Either .execute() or .executemany() contain the actual SQL commands that must be in single or double quotes.

import mysql.connector

mydatab = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)

mycursor = mydatab.cursor()

sql = 'INSERT INTO players (name, age) VALUES (%s, %s)'

val = [

("Peter", "25"),

('Sam', '31'),

('Josie', "28")

]

# It is Ok to mix and match single and double quote in the same call

mycursor.executemany(sql, val)

mydatab.commit()

-------------------------------

SELECT * for all rows

SELECT column names separated by commas to only return those columns, but still gets all rows.

Either way, you still need a .fetchall() and a for loop to view all matches.

mysqlcursor.execute("SELECT name, address FROM players")

mysqlresult = mysqlcursor.fetchall()

for x in mysqlresult:
print(x)

--------------------------------

WHERE is used for searching.

WHERE column_name=value

WHERE column_name LIKE '%value%'

with % as wildcard.

To prevent SQL injections, use a %s as a placeholder for the end-user input value to be passed in via a variable. This adds an extra step to capture the end-user input as a string.

sql = "SELECT * FROM players WHERE age = %s"
age = ("31")

with age containing the user input at a prompt.

----------------------------------

Here is a really clever example that permits an end-user to provide an ad hoc search query with an arbitrary keyword wildcarded. However, it is not clear if this construct is vulnerable to SQL injection.

import mysql.connector

mydatab = mysql.connector.connect(

host="localhost",

user="your username",

password="fill in",

database="mydatabase"

)

# search query phrase

user_input = "Jo"

mycursor = mydatab.cursor()

name_stem = '\'%' + user_input + '%\''

sql = "SELECT * FROM players WHERE name LIKE " + name_stem

print(sql)

mycursor.execute(sql)

mysqlresult = mycursor.fetchall()

for x in mysqlresult:

print(x)

This will find all players with Jo somewhere in their name.

To help combat SQL injection, the following version of the same code is preferred.

import mysql.connector

mydatab = mysql.connector.connect(

host="localhost",

user="your username",

password="fill in",

database="mydatabase"

)

user_input = "Jo"

# captured from a GUI or command line prompt.

mycursor = mydatab.cursor()

name_stem = ("%" + str(user_input) + "%")

# need pair of % to wildcard

print(name_stem)

print(type(name_stem))

sql = "SELECT * FROM customers WHERE name LIKE %s"

# parameter substitution

print(sql)

nm = (name_stem,)

print(nm)

print(type(nm))

# for some strange reason, it needs to a tuple with an empty trailing comma!!??

mycursor.execute(sql, nm) # nm goes in for %s

mysqlresult = mycursor.fetchall()

for x in mysqlresult:

print(x) # matches John Josie Joseph Jo

Wednesday, August 26, 2020

Installing a MySQL Database

Installing a MySQL Database

Get a free MySQL database package for Windows OS from:

https://dev.mysql.com/downloads/installer/

and choose either the off-line or web enabled installer download. You will have the option to create a free Oracle account which I did.

Note the TCP/IP connection port.

Create a root user and DB Admin user, the latter run as a standard system account.

I created all 4 possible error logs.

I choose the Full installation when prompted. Along the way you may be prompted to install Microsoft C++ redistributables which are free and the MySQL installer takes care of it for you. The only thing you have to install manually and separately is the MS Visual Studio C++ (unless you already have it), which I don't need, so I skipped that.

The full installation has a lot of programs, so be patient and you may have to retry some of them a 2nd and 3rd time to get them to download properly, all within the installer - there is a retry button.

Next install the mysql-connector-python driver using PIP. Do NOT install the mysql-connector package as that is something else.

Here is how to check that everything is installed properly and that the various components can talk to each other:

################ test 1 ######################

import mysql.connector

print(dir(mysql.connector.connect))

# If the two statements above run without errors, then the MySQL connector driver is working properly.

################ test 2 ######################

mydb = mysql.connector.connect(

host="localhost",

user="your standard system username",

password="you fill in"

)

print(dir(mydb))

print(mydb)

# if the connect function and the two print statements run without errors, then you can properly make a connection from a python script to the MySQL database.

However, if you get:

"mysql.connector.errors.NotSupportedError: Authentication plugin 'caching_sha2_password' is not supported"

then you probably installed both mysql-connector and mysql-connector-python. Uninstall mysql-connector and try again.

################ test 3 ######################

mycursor = mydb.cursor()

mycursor.execute("CREATE DATABASE mydatabase")

# if the 2 statements above run without error, then a new database is created.

################ test 4 ######################

# mycursor = mydb.cursor()

# mycursor.execute("CREATE DATABASE mydatabase")

mycursor.execute("SHOW DATABASES")

for x in mycursor:
print(x)

Once you have made a database, check to make sure it really is there with the SHOW command and a loop; the MySQL installed with some default already created databases and you want to be able to see them all.

It is not enough to just issue a SHOW command, you still have to loop to see them. If you try to loop without first running SHOW, the loop will exit because mycursor will have length zero for the its internal database count.

################ test 5 ######################

import mysql.connector

mydb = mysql.connector.connect(

host="localhost",

user="standard system username",

password="fill in",

database="mydatabase"

) # Once the database is first successfully created and checked for existence, just put the database name in the connector call as above.

mycursor.execute("CREATE TABLE customers (name VARCHAR(255), address VARCHAR(255))")

If the above runs without error, then you can create a table within a database.

################ test 6 ######################

##### mycursor.execute("CREATE TABLE customers (name

##### VARCHAR(255), address VARCHAR(255))")

mycursor.execute("SHOW TABLES")

for x in mycursor:
print(x)

to prove the table still exists without errors.

Once you've completed those 6 tests, it is time to assign a Primary Key to the already existing database.

mycursor.execute("ALTER TABLE customers ADD COLUMN id INT AUTO_INCREMENT PRIMARY KEY")

Python Code Lessons and Insights

Pages

Wednesday, December 30, 2020

Tableau Gotchas and Workarounds

Tuesday, December 29, 2020

Autocorrelation and Standard Errors

Tuesday, December 1, 2020

Vandermonde Prediction Intervals

Monday, November 16, 2020

Tableau setting up

Monday, November 2, 2020

SciPy Highlights

GitHub Repositories

Tuesday, October 20, 2020

More SQL Commands

Monday, October 19, 2020

eBay Finding API issues

Thursday, October 1, 2020

Database enhancements and updates

Monday, September 14, 2020

More About MySQL Workbench

Thursday, September 10, 2020

MySQL Workbench

Monday, August 31, 2020

SQL JOIN

Friday, August 28, 2020

SQL: ORDER BY - DELETE FROM - DROP TABLE - UPDATE - LIMIT/OFFSET

Thursday, August 27, 2020

SQL: INSERT - SELECT - WHERE LIKE

Wednesday, August 26, 2020

Installing a MySQL Database

Total Pageviews