Splitting Data into Training and Test Data Sets

When image matrices and their associated labels are stored in two separate matrices, it is often difficult to split the data into training and test sets randomly. Scikit-learn allows for splitting data into training and test sets relatively easy. For the example below, the data from the Statoil/C-CORE Iceberg Classifier Challenge Kaggle competition was used.

Initial setup:

Splitting Data into Training and Test Data Sets 1.png

Splitting data into a train and test set:

Splitting Data into Training and Test Data Sets 2.png
Splitting Data into Training and Test Data Sets 3.png

Overview of Neural Networks and Gradient Descent

Below are a couple videos that provide great overviews of neural networks and gradient descent. Even for those familiar with both topics, these videos are well put together and worth the view. 

Videos by 3Blue1Brown

But what *is* a Neural Network? | Deep learning, Part 1

Gradient descent, how neural networks learn | Deep learning, part 2

Kaggle - Digit Recognizer - CNN 99.4% Accuracy

Imaged below is the basic Convolutional Neural Network (CNN) for the Kaggle - Digit Recognizer competition.

Digit Recognizer - Convolutional Neural Network

Digit Recognizer - Convolutional Neural Network

Below provides an overview ensembling the CNN model. With executing the fit_model function 5 times, a total of a 125 epochs were executed.

Digit Recognizer - CNN-Ensemble.png

Running predictions against the test set for each model and averaging the prediction for each image.

Digit Recognizer - CNN - Predict.png

After exporting and submitting results, the basic CNN achieved 99.4% accuracy.

Digit Recognizer _ Kaggle-Submission.png