FastAI with Google Colab

January 1, 2019 Jordan Bishop

One of the biggest challenges with practicing deep learning is having a research environment to build, train, and test deep learning models. Building a deep learning capable personal computer or using a cloud-based development environment can be costly and/or time consuming to setup. This post is designed to help FastAI (https://course.fast.ai/about.html) learners utilize Google’s Colaboratory research tool.

Google Colaboratory (Colab) is an online research tool for machine learning. It is FREE and offers GPU/TPU hardware acceleration for training deep learning models.

Hardware acceleration can be changed in the Edit menu under Notebook Settings.

Google Colab FAQ (https://research.google.com/colaboratory/faq.html)

The FastAI courses taught by Jeremy Howard and Rachel Thomas are great learning resources for anyone that is interested in deep learning. (https://course.fast.ai/index.html)

Below is a link to a Google Colab Jupyter notebook. This notebook will setup the Google Colab runtime with all the necessary tools and libraries to build deep learning models reviewed in the FastAI training lessons.

https://colab.research.google.com/drive/1ppP7qds7VJfzfISMFynbR-t40OuEPbUS

Jupyter Notebook on Windows 10 (Anaconda)

June 24, 2018 Jordan Bishop

This post is a step-by-step process for using Jupyter notebooks on a Windows 10 PC.

Download your desired Anaconda version and follow the steps pictured.

Anaconda download URL: https://www.anaconda.com/download/

2018-04-30 20_41_59-Downloads _ Anaconda ‎- Microsoft Edge.png

2018-04-30 20_43_09-Downloads _ Anaconda ‎- Microsoft Edge.png

2018-04-30 20_43_36-Open File - Security Warning.png

2018-04-30 20_44_04-Anaconda3 5.1.0 (64-bit) Setup.png

2018-04-30 20_44_28-Anaconda3 5.1.0 (64-bit) Setup.png

2018-04-30 20_44_47-Anaconda3 5.1.0 (64-bit) Setup.png

2018-04-30 20_46_48-Anaconda3 5.1.0 (64-bit) Setup.png

2018-04-30 20_47_47-Anaconda3 5.1.0 (64-bit) Setup.png

2018-04-30 21_01_38-Anaconda3 5.1.0 (64-bit) Setup.png

Microsoft VSCode is not an install requirement. If you do not wish to install Microsoft VSCode, click the Skip button.

2018-04-30 21_02_19-Anaconda3 5.1.0 (64-bit) Setup.png

2018-04-30 21_04_50-Anaconda3 5.1.0 (64-bit) Setup.png

2018-04-30 21_05_21-Anaconda3 5.1.0 (64-bit) Setup.png

Anaconda is now installed. To create a new Jupyter Notebook server instance, in your Start menu:

Type Anaconda
Right-click the Anaconda Prompt desktop app
Select "Run as administrator"

When prompted, click "Yes"

2018-04-30 22_06_54-Windows 10 [Running] - Oracle VM VirtualBox.png

Navigate to the system director where you would like your Jupyter notebooks to be saved. Below is an example command to navigate to Documents system directory.

cd C:\Users\TempOS\Documents

*Note* TempOS should be replaced with your Windows username

2018-04-30 21_10_10-Administrator_ Anaconda Prompt.png

Enter "jupyter notebook" into the command prompt and press Enter on your keyboard.

2018-04-30 21_19_52-Administrator_ Anaconda Prompt.png

Copy and paste the URL displayed in the command prompt into a web browser. Navigate to the URL.

In order to create a new Jupyter notebook, click New>Python 3 notebook.

2018-04-30 21_22_23-Administrator_ Anaconda Prompt - jupyter notebook.png

Picture below is an example Jupyter notebook.

2018-04-30 21_26_11-Untitled and 1 more page ‎- Microsoft Edge.png

Jupyter Notebooks Nbextensions

March 18, 2018 Jordan Bishop

Utilizing a Jupyter notebook for python programming is a must. While Jupyter notebooks perform well with the standard installation, installing NbExtensions is absolutely worthwhile.

This post will review installing NbExtensions via Jupyter NbExtensions Configurator (https://github.com/Jupyter-contrib/jupyter_nbextensions_configurator) and enabling some of the more useful extensions. NbExtensions Configurator will add a tab to your Jupyter Notebook homepage that will allow you to enable/disable extensions:

Pictured below is the Jupyter notebook from the blog post Linear Regression: Housing Prices prior to installing NbExtensions Configurator.

To install NbExtensions, open a terminal window and execute the following commands:

pip install jupyter_nbextensions_configurator jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user

After completing the install of NbExtentions Configurator, start the Jupyter notebook server and open the Jupyter Notebook homepage. The Nbextensions tab is now available

Pictured below is the notebook extensions default configuration.

Enable the following extensions:

Codefolding
Hide input
Notify
Codefolding in Editor
Scratchpad
Tree Filter
Collapsible Headings
Snippets
Table of Contents (2)
Variable Inspector

Detailed documentation on all available extensions can be found at here.

Pictured below is the Jupyter notebook from the blog post Linear Regression: Housing Prices post installation of NbExtensions Configurator (with the extensions specified above enabled).

Splitting Data into Training and Test Data Sets

February 11, 2018 Jordan Bishop

When image matrices and their associated labels are stored in two separate matrices, it is often difficult to split the data into training and test sets randomly. Scikit-learn allows for splitting data into training and test sets relatively easy. For the example below, the data from the Statoil/C-CORE Iceberg Classifier Challenge Kaggle competition was used.

Initial setup:

Splitting Data into Training and Test Data Sets 1.png

Splitting data into a train and test set:

Splitting Data into Training and Test Data Sets 2.png

Splitting Data into Training and Test Data Sets 3.png

Linear Regression: Housing Prices (Andrew Ng - Stanford University)

November 27, 2017 Jordan Bishop

Below is an example of linear regression performed within a Jupyter notebook. This simple linear regression notebook was built to mirror a Matlab linear regression project in Andrew Ng's Stanford University Machine Learning course. The python Jupyter notebook can be downloaded here and the data set used can be downloaded here.

Linear Regression: Housing Prices

Jupyter Notebook version of Matlab programming assingment for Andrew Ng's (Stanford University) Machine Learning Course

# import libraries
import matplotlib.pyplot as plt
# display matplotlib graph's within notebook
%matplotlib inline 
import numpy as np
import os

# specify path to training data
path = "./"
# import housing data set
data = np.genfromtxt(path + "housingData.csv", dtype=float, delimiter=',')

# set the numpy display preferrences
np.set_printoptions(precision=3,suppress=True)
# display top 5 records (Square Feet, Bedrooms, Selling Price)
data[:5]

array([[   2104.,       3.,  399900.],
       [   1600.,       3.,  329900.],
       [   2400.,       3.,  369000.],
       [   1416.,       2.,  232000.],
       [   3000.,       4.,  539900.]])

Data Preprocessing

# set X data equal to Square Feet and Bedrooms
X = data[:,0:2]
# set y data (value to predict) equal to selling price
y = data[:, 2]
# get the number of training examples
m = len(y)
# Store X values in X_norm which will become the normalized X values
X_norm = X
# create array's of zeros for mu, sigma, amd theta
mu = np.zeros((1,np.size(X[:1])))
sigma = np.zeros((1,np.size(X[:1])))
theta = np.ndarray.flatten(np.zeros((3, 1)))

# Normalize X data
for i in range(np.size(mu)):
    # Identify mean value for each dimension/column
    mu[:,i] = np.mean(X[:,i])
    # Identify standard deviation value for each dimension/column
    sigma[:,i] = np.std(X[:,i])
    # Set X_norm equal to the X normalized value ((value-meanValue)/standardDeviation)
    X_norm[:,i] = (X[:,i]-mu[:,i])/sigma[:,i]

# Add a dimension/column of 1's and X_norm will be used instead of X
X_norm = np.append(np.ones((m,1)),X_norm, axis=1)

# Display normalized X data (appended 1's, normalized square footage, normalized bedrooms)
X_norm[:5]

array([[ 1.   ,  0.131, -0.226],
       [ 1.   , -0.51 , -0.226],
       [ 1.   ,  0.508, -0.226],
       [ 1.   , -0.744, -1.554],
       [ 1.   ,  1.271,  1.102]])

Perform Linear Regression

# set learning rate
alpha = 0.01
# set number of interations
num_iters = 400

# create blank array to capture cost function value after each iteration
J_history = np.zeros((num_iters, 1))

# perform linear regression for specified number of iterations
for i in range(num_iters):
    theta = theta - np.dot(np.transpose(X_norm),np.ndarray.flatten(np.dot(X_norm,theta)) - y)*(alpha/m)
    # set cost function value to 0 for each iteration
    J_cost = 0
    # capture cost function value across data set
    for j in range(m):
        J_cost = J_cost + ((1/(2*m))*np.square(np.dot(np.transpose(theta),np.transpose(X_norm[j,:]))-y[j]))
    # store cost function value for each itteration
    J_history[i] = J_cost

# display cost function value for each itteration
plt.plot(J_history)

[<matplotlib.lines.Line2D at 0x7ff3d2410f28>]

Predict Selling Price

# set square footage and number of bedrooms to predict selling price
sqrFtPred = 1650
bedRoomPred = 3

# predict selling cost (y value)
predictValues = ([1,(sqrFtPred-mu[0,0])/sigma[0,0],(bedRoomPred-mu[0,1])/sigma[0,1]])
predictedSellingPrice = np.dot(predictValues,theta)

# display predicted selling price
predictedSellingPrice
print('${:,.2f}'.format(predictedSellingPrice))

$289,221.65

Kaggle - Digit Recognizer - CNN 99.4% Accuracy

October 9, 2017 Jordan Bishop

Imaged below is the basic Convolutional Neural Network (CNN) for the Kaggle - Digit Recognizer competition.

Digit Recognizer - Convolutional Neural Network

Below provides an overview ensembling the CNN model. With executing the fit_model function 5 times, a total of a 125 epochs were executed.

Running predictions against the test set for each model and averaging the prediction for each image.

After exporting and submitting results, the basic CNN achieved 99.4% accuracy.

Digit Recognizer _ Kaggle-Submission.png

Kaggle - Leaf Classification: Directory Structure and Moving Files

September 17, 2017 Jordan Bishop

Jupyter notebook for setting up the directory structure for Kaggle's Leaf Classification competition has been published. The notebook walks through the process for:

Unpacking/Unzipping the competition files
Creating directory structure based off the train.csv data set
Moving images to appropriate train, valid, and test directories.
- The train and valid directories contain directories specific to each leaf species

Directory Structure and Moving Files _ Kaggle.png