FastAI with Google Colab

January 1, 2019 Jordan Bishop

One of the biggest challenges with practicing deep learning is having a research environment to build, train, and test deep learning models. Building a deep learning capable personal computer or using a cloud-based development environment can be costly and/or time consuming to setup. This post is designed to help FastAI (https://course.fast.ai/about.html) learners utilize Google’s Colaboratory research tool.

Google Colaboratory (Colab) is an online research tool for machine learning. It is FREE and offers GPU/TPU hardware acceleration for training deep learning models.

Hardware acceleration can be changed in the Edit menu under Notebook Settings.

Google Colab FAQ (https://research.google.com/colaboratory/faq.html)

The FastAI courses taught by Jeremy Howard and Rachel Thomas are great learning resources for anyone that is interested in deep learning. (https://course.fast.ai/index.html)

Below is a link to a Google Colab Jupyter notebook. This notebook will setup the Google Colab runtime with all the necessary tools and libraries to build deep learning models reviewed in the FastAI training lessons.

https://colab.research.google.com/drive/1ppP7qds7VJfzfISMFynbR-t40OuEPbUS

Splitting Data into Training and Test Data Sets

February 11, 2018 Jordan Bishop

When image matrices and their associated labels are stored in two separate matrices, it is often difficult to split the data into training and test sets randomly. Scikit-learn allows for splitting data into training and test sets relatively easy. For the example below, the data from the Statoil/C-CORE Iceberg Classifier Challenge Kaggle competition was used.

Initial setup:

Splitting Data into Training and Test Data Sets 1.png

Splitting data into a train and test set:

Splitting Data into Training and Test Data Sets 2.png

Splitting Data into Training and Test Data Sets 3.png

Kaggle - Leaf Classification: Directory Structure and Moving Files

September 17, 2017 Jordan Bishop

Jupyter notebook for setting up the directory structure for Kaggle's Leaf Classification competition has been published. The notebook walks through the process for:

Unpacking/Unzipping the competition files
Creating directory structure based off the train.csv data set
Moving images to appropriate train, valid, and test directories.
- The train and valid directories contain directories specific to each leaf species