FastAI with Google Colab

One of the biggest challenges with practicing deep learning is having a research environment to build, train, and test deep learning models. Building a deep learning capable personal computer or using a cloud-based development environment can be costly and/or time consuming to setup. This post is designed to help FastAI (https://course.fast.ai/about.html) learners utilize Google’s Colaboratory research tool.

Google Colaboratory (Colab) is an online research tool for machine learning. It is FREE and offers GPU/TPU hardware acceleration for training deep learning models.

Hardware acceleration can be changed in the Edit menu under Notebook Settings.

Google Colab FAQ (https://research.google.com/colaboratory/faq.html)

Google Colab GPU-TPU.png

The FastAI courses taught by Jeremy Howard and Rachel Thomas are great learning resources for anyone that is interested in deep learning. (https://course.fast.ai/index.html)

Below is a link to a Google Colab Jupyter notebook. This notebook will setup the Google Colab runtime with all the necessary tools and libraries to build deep learning models reviewed in the FastAI training lessons.

https://colab.research.google.com/drive/1ppP7qds7VJfzfISMFynbR-t40OuEPbUS

Google Colab FastAI Setup.png

Splitting Data into Training and Test Data Sets

When image matrices and their associated labels are stored in two separate matrices, it is often difficult to split the data into training and test sets randomly. Scikit-learn allows for splitting data into training and test sets relatively easy. For the example below, the data from the Statoil/C-CORE Iceberg Classifier Challenge Kaggle competition was used.

Initial setup:

Splitting Data into Training and Test Data Sets 1.png

Splitting data into a train and test set:

Splitting Data into Training and Test Data Sets 2.png
Splitting Data into Training and Test Data Sets 3.png

Kaggle - Leaf Classification: Directory Structure and Moving Files

Jupyter notebook for setting up the directory structure for Kaggle's Leaf Classification competition has been published. The notebook walks through the process for:

  • Unpacking/Unzipping the competition files
  • Creating directory structure based off the train.csv data set
  • Moving images to appropriate train, valid, and test directories.
    • The train and valid directories contain directories specific to each leaf species
Directory Structure and Moving Files _ Kaggle.png

Resolving "The cuda backend is deprecated..." Warning

When using Theano backend, the following warning message was displayed in my Jupyter notebook:

The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend.

TheanoBackend.jpg

The warning message was resolved by completing the following 2 actions:

Action 1. Update Theano config file

  1. In Linux terminal execute command
    sudo nano ~/.theanorc
  2. Update "device = gpu" to
    device = cuda0
Theano Config.png

Action 2: Update Jupyter notebook to include "import theano.gpuarray"

Note that the Cuda backend warning message is no longer displayed.

Note that the Cuda backend warning message is no longer displayed.