Learning How To Classify Images Using Keras Part 3

Hunter Owen
4 min readFeb 6, 2021

For part 3 we will go over extracting visual information from the model and look at the results. LIME is a library that is used in python that stands for Local Interpretable Model-agnostic Explanations. Essentially, it interprets the decision boundaries of what the neural network is looking when classifying. Obviously when doing a Conversational Neural network you want to make sure it is picking up on the correct features, but how can you trust it other than looking at the accuracy score? That is where LIME comes in! Trusting your model is very important especially if it is to be deployed in real world application.

Remember our model from part two looked as such…

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 174, 206, 10) 280
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 87, 103, 10) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 85, 101, 20) 1820
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 42, 50, 20) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 40, 48, 20) 3620
_________________________________________________________________
flatten (Flatten) (None, 38400) 0
_________________________________________________________________
dense (Dense) (None, 64) 2457664
_________________________________________________________________
dense_1 (Dense) (None, 16) 1040
_________________________________________________________________
dense_2 (Dense) (None, 4) 68
=================================================================
Total params: 2,464,492
Trainable params: 2,464,492
Non-trainable params: 0
_______________________________________________________________

On our final test hold out set we were able to get an average of 97% across 4 different classifications. Mild and None being predicted wrong the most. I’m just as interested in seeing what my model predicted wrong and what it was looking at when predicting wrong. Typically what doctors use to classify Alzheimers is the shrinkage of the Parietal and Temporal lobes shown below.

First let’s look at images we classified correctly, but running the code below.

explainer = lime_image.LimeImageExplainer(random_state=42)ncols = 5
nrows = 4
a, b = test_generator[0] # a is now equal to an array of all the images in the test generator
incorrects = np.nonzero(y_pred != test_generator.classes)[0] # list of all indecies our model got wrong
indecies = np.random.choice(incorrects, ncols) # Random selecting of wrongly predicted images. ncols is the how many images
class_name = ['Mild', 'Moderate', 'Normal', 'Very Mild']
fig, ax = plt.subplots(nrows, ncols, sharex='col', sharey='row')
fig.set_figwidth(20)
fig.set_figheight(14)
for j in range(nrows):
explanation = explainer.explain_instance(a[indecies][j],
model.predict,
top_labels=4, hide_color=0, num_samples=200,
random_seed=42)
ax[j,0].imshow(a[indecies][j])
ax[j,0].set_title(class_name[test_generator.classes[indecies][j]])
for i in range(ncols - 1):
temp, mask = explanation.get_image_and_mask(i, positive_only=True,
num_features=3, hide_rest=False)
ax[j,i+1].imshow(mark_boundaries(temp / 2 + 0.5, mask))
ax[j,i+1].set_title('p({}) = {:.4f}'.format(class_name[i], Y_pred[indecies[j]][i]))
plt.savefig(f'../report/figures/Lime_wrong_preds', dpi = 100)

This results in the images below. On the far left is the actual classification of the image and then the 4 images to the right of it is the probability of it being that class; For example, the Normal brain image predicted 94% chance of being normal and 6% chance of being very mild. The yellow outline is what features the model finds relevant. You can see for the class it’s predicting correctly its identifying the Parietal or temporal lobes pretty well.

Next we will look at the wrong predictions.

cols = 5
nrows = 4
corrects = np.nonzero(y_pred == test_generator.classes)[0] # list of all indecies our model got wrong
correct_ind = corrects[[1,136,200,800]] # Selected images from each class. ncols is the how many images
fig, ax = plt.subplots(nrows, ncols, sharex='col', sharey='row')
fig.set_figwidth(20)
fig.set_figheight(14)
for j in range(nrows):
explanation = explainer.explain_instance(a[correct_ind][j],
model.predict,
top_labels=4, hide_color=0, num_samples=200,
random_seed=42)
ax[j,0].imshow(a[correct_ind][j])
ax[j,0].set_title(class_name[test_generator.classes[correct_ind][j]])
for i in range(ncols - 1):
temp, mask = explanation.get_image_and_mask(i, positive_only=True,
num_features=3, hide_rest=False)
ax[j,i+1].imshow(mark_boundaries(temp / 2 + 0.5, mask))
ax[j,i+1].set_title('p({}) = {:.4f}'.format(class_name[i], Y_pred[correct_ind[j]][i]))
plt.savefig(f'../report/figures/Lime_correct_preds', dpi = 100)

This part is more interesting because it can help you see why it might have predicted wrong. In this specific case it’s much harder because I’m not an MRI technician and dont notice the difference between the brains. However, If you were doing a classification between dogs and cats and you wanted to see why your model was predicting wrong you might find that its highlighting parts of a dog and find it looks similar enough to some cat images then that could help you understand why it predicted wrong.

--

--