Splitting Data into Training and Test Data Sets

When image matrices and their associated labels are stored in two separate matrices, it is often difficult to split the data into training and test sets randomly. Scikit-learn allows for splitting data into training and test sets relatively easy. For the example below, the data from the Statoil/C-CORE Iceberg Classifier Challenge Kaggle competition was used.

Initial setup:

Splitting Data into Training and Test Data Sets 1.png

Splitting data into a train and test set:

Splitting Data into Training and Test Data Sets 2.png
Splitting Data into Training and Test Data Sets 3.png

Kaggle - Invasive Species Monitoring - Ensemble with ROC of 0.95532

Ensemble (Bucket of models) multiple models to achieve ROC of 0.95532.

2017-09-25 Invasive Species Monitoring _ Kaggle.png

Below is an example of creating 4 models that were trained for 100 epochs each. The last section of code saves the weights for each model in the set. 

Invasive Species - CNN-BN-Ensemble.png

Each model was then used to predict the test image set. The mean prediction for each image was then used for the Kaggle Invasive Species Monitoring competition submission.