Your Custom Text Here

Splitting Data into Training and Test Data Sets

February 11, 2018 Jordan Bishop

When image matrices and their associated labels are stored in two separate matrices, it is often difficult to split the data into training and test sets randomly. Scikit-learn allows for splitting data into training and test sets relatively easy. For the example below, the data from the Statoil/C-CORE Iceberg Classifier Challenge Kaggle competition was used.

Initial setup:

Splitting Data into Training and Test Data Sets 1.png

Splitting data into a train and test set:

Splitting Data into Training and Test Data Sets 2.png

Splitting Data into Training and Test Data Sets 3.png

Kaggle - Invasive Species Monitoring - Ensemble with ROC of 0.95532

September 25, 2017 Jordan Bishop

Ensemble (Bucket of models) multiple models to achieve ROC of 0.95532.

2017-09-25 Invasive Species Monitoring _ Kaggle.png

Below is an example of creating 4 models that were trained for 100 epochs each. The last section of code saves the weights for each model in the set.

Each model was then used to predict the test image set. The mean prediction for each image was then used for the Kaggle Invasive Species Monitoring competition submission.