Code-Searching: September 2017

Monday, 25 September 2017

How to use Hierarchical algorithm under Connectivity model of clustering ?

Methods of hierarchical linking algorithm
1. Min
2. Max
3. Ward's method
4. Group average

USE CASE
1. Text mining in NLP (Natural language processing)
For example : using Stanford NLP
2. Social network linking (LinkedIn, FB)

EXAMPLE
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt
import numpy as np
# Define your sample data set
ytdist = np.array([662., 877., 255., 412., 996., 295., 468., 268., 400., 754., 564., 138., 219., 869., 669.])

# Make Hierarchical linkage with data and
# Single linkage (uses Min method)
# Possible values: single (uses Min) / complete (uses Max) / average
Z = hierarchy.linkage(ytdist, 'single')

# Plot and show the Dendogram showing the linkage
dn = hierarchy.dendrogram(Z)

# plt.show()

How to create normalized histogram ?

Histogram : Graphical representation of the distribution of numerical data
It is used to understand how data is distributed across the bins.

Normal distribution : Arrangement of data set where most values cluster in the middle of the range and rest towards either extreme.
Its graphical representation is called BELL CURVE.
Mean, mode and median are all the same.

Normalized histogram : Histogram having Normal distribution.

EXAMPLE
import matplotlib.pyplot as plt
from numpy.random import normal, uniform

# Take some random datasets
# Create datasets with 1000 fractional numbers
gaussian_numbers = normal(size=1000)
# Create datasets with 1000 numbers in range of -3 to 3
uniform_numbers = uniform(low=-3, high=3, size=1000)

# Plot the histogram with dataset, no of bins = 20, type=step filled, in blue color
plt.hist(gaussian_numbers, bins=20, histtype='stepfilled', normed=True, color='b', label='Gaussian')

# Plot the histogram with dataset, no of bins = 20, type=step filled, in red colorplt.hist(uniform_numbers, bins=20, histtype='stepfilled', normed=True, color='r', alpha=0.5, label='Uniform')

# Set properties
plt.title("Gaussian/Uniform Histogram")
plt.xlabel("Value")
plt.ylabel("Probability")
plt.legend()
plt.show()

Sunday, 24 September 2017

Basics of statistics

MEAN
Average (Mean) = Sum of measurements / Number of measurements

MEDIAN
Middle value after arranging items in an order

MODE
Most frequently occurring value from data set

RANGE
Highest value - Lowest value

STANDARD DEVIATION
Sqrt of (Average of (Squared deviations of the items from their mean value))

VARIENCE
Average of Squared differences from their mean value
Example : {600, 470, 170, 430, 300}
               M (Mean) = 600 + 470 + 170 + 430 + 300 = 1970/5 = 394
               sqr(600-394) + sqr(470-394) + sqr(170-394) + sqr(430-394) + sqr(300-394) =
               sqr(206) + sqr(76) + sqr(−224) + sqr(36) + sqr(−94)
               (42,436 + 5,776 + 50,176 + 1,296 + 8,836) = 108,520
   Varience = 108,520 / 5 = 21,704

NORMAL DISTRIBUTION
Arrangement of data set where most values cluster in the middle of the range and rest towards either extreme.
Its graphical representation is called BELL CURVE.
Mean, mode and median are all the same.

HISTOGRAM
Graphical representation of the distribution of numerical data.

How to read CSV files using pandas module ?

USE pandas FOR READING CSV FILE

EXAMPLE
import pandas as pd

# Sample CSV data
# 5.1,0.222222222,3.5,0.625,setosa
# 4.9,0.166666667,3,0.416666667,setosa
data = pd.read_csv('myData.csv',
names = ["SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "Class"])

# Takes first 5 rows
print(data.head())

# Print the data dimension : rows and cols
print(data.shape)

How to create histogram ?

Histogram : Graphical representation of the distribution of numerical data
It is used to understand how data is distributed across the bins.

Normal distribution : Arrangement of data set where most values cluster in the middle of the range and rest towards either extreme.
Its graphical representation is called BELL CURVE.
Mean, mode and median are all the same.

Normalized histogram : Histogram having Normal distribution.

USE matplotlib TO PLOT THE HISTOGRAM (CHART)

EXAMPLE 1
import matplotlib.pyplot as plt
# Plot the histogram with fixed data

plt.hist([1, 2, 1], bins=[0, 1, 2, 3])
plt.show()

EXAMPLE 2
import matplotlib.pyplot as plt
import numpy as np

# Generate items with 1000 items
x = np.random.normal(size = 1000)

# Plot the histogram
plt.hist(x, normed=True, bins=30)
plt.ylabel('Probability')
plt.show()

EXAMPLE 3
import matplotlib.pyplot as plt
import numpy as np

# Generate items with 1000 items
x = np.random.normal(size = 1000)

# Cumulative histogram 1
plt.hist(x,
         bins=100,
         normed=True,
         stacked=True,
         cumulative=True)
plt.show()

# Cumulative histogram 2
plt.hist(x,
         bins=100,
         normed=True,
         stacked=True,
         )
plt.show()

Saturday, 23 September 2017

How to create Plot charts from TSV data ?

USE scipy FOR READING TSV FILE
USE matplotlib.pyplot FOR PLOTTING CHART
USE A CUSTOM UTIL CLASS USING MODULE os

UTILITY MODULE : utils.py
import os
# Creates 2 sub dir "data" and "charts" under current path
DATA_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "data")
CHART_DIR =
os.path.join(os.path.dirname(os.path.realpath(__file__)), "charts")

# If dirs not exist create them
for d in [DATA_DIR, CHART_DIR]:
    if not os.path.exists(d):
        os.mkdir(d)

EXAMPLE : webtraffic.py
import os
import scipy as sp
from utils import CHART_DIR, DATA_DIR

# Error handling method
def error(f, x, y):
    return sp.sum((f(x) - y) ** 2)

# Sample data in TSV
# 1   2272
# 2   nan
# 3   1386
# 4   1365
# 5   1488

# Read data from TSV file
data = sp.genfromtxt("web_traffic.tsv", delimiter="\t")
print(data[:2])
print(data.shape)

# Extract first column data to x and sec column data to y
x = data[:,0]
print(x)
y = data[:,1]
print(y)

# Check if sec column having a Non-number value, print "Nan"
Nan = sp.sum(sp.isnan(y))
print(Nan)# Check if sec column is not having a Non-number value, print its value
x = x[~sp.isnan(y)]
print(x)
y = y[~sp.isnan(y)]
print(y)

# Set chart data and properties - Title, X/Y axis labels
import matplotlib.pyplot as plt
plt.scatter(x,y)
plt.title("Web Traffic over the last month")
plt.xlabel("time")
plt.ylabel("Hits/hour")

# Set chart properties - Ticks, Auto scale, Grid
# Set ticks on X axis => Weekly
plt.xticks([w*7*24 for w in range(10)],
           ['week %i'%w for w in range(10)])
# Set Auto scale according to available data
plt.autoscale(tight=True)plt.grid()

# Save the chart to picture of PNG format

plt.savefig(os.path.join(CHART_DIR,"img.png"))

'''
p = polyfit(x,y,n)
Given
data x and y and the desired order of the polynomial (straight line has order 1),
it finds the model function that minimizes the error function defined earlier.
fp1, residuals, rank, sv, rcond = sp.polyfit(x, y, 1, full=True)
The polyfit() function returns the parameters of the fitted model function,
fp1; and by setting full to True, we also get additional background information
on the fitting process.
'''

# Linear fit = Straight line, May vary from data
# fp1, residuals, rank, sv, rcond = sp.polyfit(x, y, 1, full=True)

# Curved fit = Curved line very close to data
fp1, residuals, rank, sv, rcond = sp.polyfit(x, y, 10, full=True)

print("Model parameters: %s" % fp1)
print(residuals)

f1 = sp.poly1d(fp1)
print(error(f1, x, y))

# Generate X-values for plotting

fx = sp.linspace(0,x[-1], 1000)
plt.plot(fx, f1(fx),'C8',linewidth=4)
plt.legend(["d=%i" % f1.order], loc="upper left")

How to create Density charts from CSV data ?

USE pandas FOR READING CSV FILE
USE seaborn FOR DENSITY CHART PLOTTING

EXAMPLE
import pandas as pd
import seaborn as sns;

# Sample CSV Data
# Pregnancies   Glucose   BloodPressure   SkinThickness   Insulin BMI   DiabetesPedigreeFunction   Age   Outcome
# 6,148,72,35,0,33.6,0.627,50,1
# 1,85,66,29,0,26.6,0.351,31,0

# Read CSV Data
diabetics = pd.read_csv('diabetes.csv')
print(diabetics.keys())

# Create Data frame
my_data_frame = pd.DataFrame(diabetics)

# Print Top-5 (Head) from dataset
print(my_data_frame.head())

# Extract 2 sub dataset based on conditions
positive = diabetics.loc[diabetics.Outcome==1]
negative = diabetics.loc[diabetics.Outcome==0]

# Plot the chart using specific columns of datasets and given colors
ax = sns.kdeplot(positive.Pregnancies, positive.Glucose,
                  cmap="Greens", shade=True, shade_lowest=False)
ax = sns.kdeplot(negative.Pregnancies, negative.Glucose,
                  cmap="Reds", shade=True, shade_lowest=False)

How to handle exceptions in Python ?

SYNTAX
try:
   ...
except:
   ...
finally
...

EXAMPLE
import sys

randomList = ['a', 0, 2]

for entry in randomList:
    try:
        print(entry)
        r = 1/int(entry)
        break
    except:
        print('Oops', sys.exc_info()[0], "occured")
        print('next entry')
    finally:
        print("finally !!!")

print('The reciprocal of', entry, "is", r)

Friday, 22 September 2017

How to use Python modules in programs ?

IMPORTING MODULE
from <pkg> import <module>

# Using module under package
from openpyxl import workbook

# Direct module usage
import calendar

PRINTING MODULES OF A PACKAGE
# Prints all modules + sys vars of package openpyxl
import openpyxl
print(dir(openpyxl))

import calendar
print(dir(calendar))

# Prints all functions available in String
name="Orange"
print(dir(name))

# Prints all functions available in integer
val=50
print(dir(val))

How many ways to write or execute Python programs and install Python modules ?

PYTHON SETUP
1. Install Python.exe, Set folder path to PATH env var and Use Eclipse
2. Install Python.exe, Set folder path to PATH env var and Use VS Code
3. Install Anaconda (with embedded Python inside) and Use its editor Spider

Example Python folder path to set in PATH var :
C:\Users\Shaan\AppData\Local\Programs\Python\Python36-32\;.

INSTALLING MODULES
Using PIP installer (In case of Python.exe)
pip install pymysql
pip install python-docx
...
pip list

Using EASY installer
easy_install pymysql
easy_install python-docx
...

Using Anaconda
Install from Anaconda cloud (Search modules at anaconda.org and find correct command)

conda install pymysql
conda install python-docx
...

Using Anaconda console
Open Anaconda console and run pip install command.
pip install pymysql
pip install python-docx
...
pip list

Using Anaconda Navigator
Go to environments and search the package name
Install the package

How to open workbook and write contents to sheets ?

USE MODULE openpyxl
Only supports xlsx format (Not xls format)

# Import load_workbook module from package openpyxl and Import random module to create random values to fill in sheet
from openpyxl import load_workbook
import random
# Open the workbook
filePath = "Hello.xlsx"
fileRef = load_workbook(filePath, read_only=False)

# Get sheet names
sheetNames = fileRef.get_sheet_names()
print(sheetNames)

# Pick a sheet and fill random values in cells from row 50 to 100 and columns from A to F (1 to 6)
sheet = fileRef.get_sheet_by_name('January_2017')
for row in range(51,100):
    for col in range(1,6):
        sheet.cell(column=col, row=row, value="%d"
                    % random.randint(1,10000))

# Save the file after writing
fileRef.save(filePath)

How to create a Excel workbook by Python ?

USE MODULE openpyxl
Only supports xlsx format (Not xls format)

# Import Workbook module (.py file) from package openpyxl and Import calendar module directly
from openpyxl import Workbook
import calendar
# Create a workbook object
wb = Workbook()

# Create 12 sheets for each month in workbook
i=0
for month in calendar.month_name:
    if not(len(str(month)) == 0):
        print(month)
        wb.create_sheet(month+"_2017", i)
        i = i + 1

# Write to file
filePath = "Hello.xlsx"
wb.save(filePath)

How to create and use database connection ?

FOR MYSQL, USE MODULE pymysql
FOR ORACLE, USE MODULE cx_oracle

import pymysql

# Create connection to MySQL DB
def createConn():
    conn = pymysql.connect(host="10.238.207.56", user="root", passwd="Mypass@123", db="idmgt")
    return conn

# Method to fetch all records
def fetchRecords():
    conn = createConn()
    query = "select * from roles"
    cursor = conn.cursor()
    cursor.execute(query)
    return cursor.fetchall()

# Method to fetch record by ID
def fetchByRoleId(roleId):
    conn = createConn()
    cursor = conn.cursor()
    cursor.execute("""select * from roles where id=%d""" %(roleId))
    return cursor.fetchall()

# First statement : Call upper method to fetch records
rows = fetchRecords()

# Then, Iterate the records
for row in rows:
    for col in row:
        print(col)

Tuesday, 19 September 2017

How to play with list in Python ?

EXAMPLE
fish = ['f1', 'f2', 'f3']
print(type(fish))
print(fish)

<class 'list'>
['f1', 'f2', 'f3']

fish.append('f4')
print(fish)

['f1', 'f2', 'f3', 'f4']

fish.insert(0, 'f0')
print(fish)
['f0', 'f1', 'f2', 'f3', 'f4']

more_fish = ['f6', 'f7', 'f8', 'f9', 'f10']
fish.extend(more_fish)
fish.extend(more_fish[0:2])

fish2 = fish.copy()
print(fish2)

['f0', 'f1', 'f2', 'f3', 'f4', 'f6', 'f7', 'f8', 'f9', 'f10', 'f6', 'f7']

fish.reverse()
print(fish)

['f7', 'f6', 'f10', 'f9', 'f8', 'f7', 'f6', 'f4', 'f3', 'f2', 'f1', 'f0']

fish_ages = [1,2,5,7,8,1]
print(fish_ages.count(1))

2

fish_ages.sort()
print(fish_ages)

[1, 1, 2, 5, 7, 8]

fish.clear()
print(fish)

[]

How to use while loop in Python ?

EXAMPLE
Below while loop continues until you provide "pass" as input.

pwd = ''

while pwd != 'pass':
print('What is the password ?')
pwd = input()

print('Yes, the password' + pwd + '. You can enter.')

How to use for, break and continue in Python ?

for loop body starts with colon.
The indentation shows the body contents.

EXAMPLES

for i in range(0,5):
    print(i)

for i in range(0,5):
    if i == 3:
        continue
    print(i)

for i in range(0,5):
    if i == 3:
        break
    print(i)

for i in range(0,10,3):
    print(i)

Last example also have a "step" parameter.

OUTPUT
0
1
2
3
4

0
1
2
4

0
1
2

0
3
6
9