FastAI with Google Colab

One of the biggest challenges with practicing deep learning is having a research environment to build, train, and test deep learning models. Building a deep learning capable personal computer or using a cloud-based development environment can be costly and/or time consuming to setup. This post is designed to help FastAI (https://course.fast.ai/about.html) learners utilize Google’s Colaboratory research tool.

Google Colaboratory (Colab) is an online research tool for machine learning. It is FREE and offers GPU/TPU hardware acceleration for training deep learning models.

Hardware acceleration can be changed in the Edit menu under Notebook Settings.

Google Colab FAQ (https://research.google.com/colaboratory/faq.html)

Google Colab GPU-TPU.png

The FastAI courses taught by Jeremy Howard and Rachel Thomas are great learning resources for anyone that is interested in deep learning. (https://course.fast.ai/index.html)

Below is a link to a Google Colab Jupyter notebook. This notebook will setup the Google Colab runtime with all the necessary tools and libraries to build deep learning models reviewed in the FastAI training lessons.

https://colab.research.google.com/drive/1ppP7qds7VJfzfISMFynbR-t40OuEPbUS

Google Colab FastAI Setup.png

Splitting Data into Training and Test Data Sets

When image matrices and their associated labels are stored in two separate matrices, it is often difficult to split the data into training and test sets randomly. Scikit-learn allows for splitting data into training and test sets relatively easy. For the example below, the data from the Statoil/C-CORE Iceberg Classifier Challenge Kaggle competition was used.

Initial setup:

Splitting Data into Training and Test Data Sets 1.png

Splitting data into a train and test set:

Splitting Data into Training and Test Data Sets 2.png
Splitting Data into Training and Test Data Sets 3.png

Kaggle - Invasive Species Monitoring - Ensemble with ROC of 0.95532

Ensemble (Bucket of models) multiple models to achieve ROC of 0.95532.

2017-09-25 Invasive Species Monitoring _ Kaggle.png

Below is an example of creating 4 models that were trained for 100 epochs each. The last section of code saves the weights for each model in the set. 

Invasive Species - CNN-BN-Ensemble.png

Each model was then used to predict the test image set. The mean prediction for each image was then used for the Kaggle Invasive Species Monitoring competition submission.

Resolving "The cuda backend is deprecated..." Warning

When using Theano backend, the following warning message was displayed in my Jupyter notebook:

The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend.

TheanoBackend.jpg

The warning message was resolved by completing the following 2 actions:

Action 1. Update Theano config file

  1. In Linux terminal execute command
    sudo nano ~/.theanorc
  2. Update "device = gpu" to
    device = cuda0
Theano Config.png

Action 2: Update Jupyter notebook to include "import theano.gpuarray"

Note that the Cuda backend warning message is no longer displayed.

Note that the Cuda backend warning message is no longer displayed.

 

 

Installing Kaggle CLI and Competition Data Download

In the computer terminal enter the following commands to install the unofficial Kaggle CLI (Command Line Interface) and download competition files:

  1. pip install kaggle-cli
  2. kg config -u [Kaggle username] -p [Kaggle password]
  3. cd Documents/nbs/
  4. kg download -c [Kaggle competition name]

The purpose of each command:

  1. Installs the Kaggle CLI
  2. Sets the username and password for the Kaggle CLI. This is why the username and password parameters do not need defined in step 4.
  3. Change computer directory to the location where the competition files will be stored
  4. Downloads the competition files. IMPORTANT! before being able to download the competition files, the competition rules will need to be accepted. The accept option is at the end of the rules section of the competition on the Kaggle website

Example of installing Kaggle CLI and downloading Dogs vs. Cats competition files:

  1. pip install kaggle-cli
  2. kg config -u KaggleUser -p P@ssw0rd123
  3. cd Documents/nbs/
  4. kg download -c 'dogs-vs-cats'

Operating System Setup

After installing Ubuntu operating system, the following commands were executed in the terminal:

If operating system is earlier release than 16.04 LTS
sudo do-release-upgrade

--optional start
Install SSH Server in order to remote into the deep learning PC from another computer on the same network.
sudo apt-get install openssh-server -y

Verify SSH service is running.
sudo service ssh status

Identify network IP address for SSH client. A populate SSH client is Putty
ifconfig
--optional end

Install Tmux. Useful for operating multiple terminal windows from within the same SSH session. Google Tmux for more information
sudo apt-get install tmux

Update and reboot Ubuntu operating system
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get dist-upgrade -y
sudo reboot


Installing Anaconda for Ubuntu
cd /tmp
curl -O https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh
bash Anaconda3-4.4.0-Linux-x86_64.sh
[enter] and [yes] to all
sudo reboot


After Anaconda is installed, configure Jupyter notebooks.
jupyter notebook --generate-config
[note config location]
jupyter notebook password
[enter password]
sudo nano /home/cnnpc/.jupyter/jupyter_notebook_config.py
Change #c.NotebookApp.ip = 'localhost' to c.NotebookApp.ip = '[your ip]'
Change #c.NotebookApp.port = 8888 to c.NotebookApp.port = [your port]


Create a Jupyter notebook directory in your Documents directory
cd Documents/
mkdir nbs
cd nbs
jupyter notebook
[verify and connect]


Install Nvidia repos and Cuda
cd /tmp
wget "http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.44-1_amd64.deb" -O "cuda-repo-ubuntu1604_8.0.44-1_amd64.deb"
sudo dpkg -i cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
sudo apt-get update
sudo apt-get -y install cuda
sudo reboot
sudo modprobe nvidia


Verify GPU is recognized
nvidia-smi

Install bcolz and pip.
conda install -y bcolz
sudo apt-get install python3-pip -y


Upgrade Anaconda modules
conda upgrade -y --all

Install Keras
pip install keras==1.2.2

Create Keras directory
mkdir ~/.keras

Create Keras json configuration file. (copy echo.. ..keras.json and paste into terminal and press enter)
echo '{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}' > ~/.keras/keras.json


Install Theano
pip3 install theano

Create Theano configuration file. (copy echo.. ..theanorc and paste into terminal and press enter)
echo "[global]
device = gpu
floatX = float32
[cuda]
root = /usr/local/cuda" > ~/.theanorc


Install Theano pygpu
conda install theano pygpu

Get fast.ai cudnn file, extract, and copy to appropriate directories
wget "http://files.fast.ai/files/cudnn.tgz" -O "cudnn.tgz"
tar -zxf cudnn.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/


Install Glances. This is a great application for monitoring computer usage. (Including GPU usage)
curl -L https://bit.ly/glances | /bin/bashsudo

Deep Learning PC Build

List of parts for a deep learning capable PC for around $1000 USD:

  • Intel Core i5-7500
  • Ballistix Sport LT 16GB Kit
  • EVGA SuperNOVA 650 G2
  • GIGABYTE GA-H270-HD3
  • Corsair Carbide Series SPEC-01
  • Samsung 850 EVO 250GB 2.5-Inch SATA III Internal SSD
  • GEFORCE GTX 1060 6GB

Parts not listed:

  • PC Monitor
  • Keyboard and Mouse
  • 4 GB or greater USB drive