## Mask R-CNN Visualization

This notebook is used to visualize a trained Mask R-CNN model. First, we set up the system path for loading python packages.

In [None]:
import os 
import sys
path = os.path.dirname(os.getcwd())
sys.path.append(path)
sys.path.append(os.path.join(path, 'MaskRCNN'))

In the next cell, we configure the data directory, and finalize the model. We expect the pretrained ResNet-50 backbone weights, [`ImageNet-R50-AlignPadding.npz`](http://models.tensorpack.com/FasterRCNN/ImageNet-R50-AlignPadding.npz) to be avilable under the `pretrained-models` folder of your data directory. Set `cfg.DATA.BASEDIR` to the root of your data directory, and `model_output_dir` to the root of your model output directory containing sub-folders `train_log/maskrcnn`.

In [None]:
# create a mask r-cnn model
from model.generalized_rcnn import ResNetFPNModel

from config import finalize_configs, config as cfg
from dataset import DetectionDataset
from data import get_viz_dataflow

MODEL = ResNetFPNModel()
cfg.MODE_FPN = True
cfg.MODE_MASK = True
model_output_dir = "/logs/maskrcnn-optimized" # set training model output directory root
cfg.DATA.BASEDIR = "/data" # set data directory root

# file path to previoulsy trained mask r-cnn model
trained_model = f'{model_output_dir}/train_log/maskrcnn/model-45000.index'

# fixed resnet50 backbone weights
cfg.BACKBONE.WEIGHTS = f'{cfg.DATA.BASEDIR}/pretrained-models/ImageNet-R50-AlignPadding.npz'

# dataset location
# calling detection dataset gets the number of coco categories 
# and saves in the configuration
DetectionDataset()
finalize_configs(is_training=False)
df = get_viz_dataflow('val2017')
df.reset_state()

In the next cell, we create the predictor for inference with the trained model.

In [None]:
from tensorpack.predict.base import OfflinePredictor
from tensorpack.tfutils.sessinit import get_model_loader
from tensorpack.predict.config import PredictConfig

# Create an inference predictor 
predictor = OfflinePredictor(PredictConfig(
 model=MODEL,
 session_init=get_model_loader(trained_model),
 input_names=['images', 'orig_image_dims'],
 output_names=[
 'generate_fpn_proposals/boxes',
 'fastrcnn_all_scores',
 'output/boxes',
 'output/scores',
 'output/labels',
 'output/masks'
 ]))

`df` is a generator that will produce images and annotations. Images are loaded in BGR format, so need to be flipped to RGB.

In [None]:
an_image = next(df.get_data())
# get image
img = an_image['images']
# flip image channels to convert BGR to RGB
img = img[:,:,[2,1,0]]
# get ground truth bounding boxes
gt_boxes = an_image['gt_boxes']
# get ground truth labels
gt_labels = an_image['gt_labels']
# get ground truth image mask
gt_masks = an_image['gt_masks']

Display the image by itself.

In [None]:
import matplotlib.pyplot as plt

fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(img.astype(int))

Add in ground truth bounding boxes and labels using the draw_annotation function.

In [None]:
from viz import draw_annotation


gt_viz = draw_annotation(img, gt_boxes, gt_labels)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(gt_viz.astype(int))

Pass this image to the model. Get region proposal network outputs and final outputs. The pred function takes as input the original image, (with an additional batch dimension added, because our model expects batches), and the shape of the original image. It returns:

* `rpn_boxes`: a 1000 x 4 size matrix specifying segmented regions of the image.

* `all_scores`: 1000 x cfg.DATA.NUM_CLASS matrix the probability of each category for each region proposal (includes 1 for background).

* `final_boxes`: N x 4 matrix the final set of region boxes after applying non-max supression.

* `final_scores`: length N vector of the objectness of the final boxes

* `final_labels`: N x cfg.DATA.NUM_CLASS matrix the probability of each category for final boxes (includes 1 for background).

* `masks`: N x 28 x 28 tensor containing masks for each final box. Note that these need to be scaled to each box size to 
 apply to the original image

In [None]:
import numpy as np
rpn_boxes, all_scores, final_boxes, final_scores, final_labels, masks = predictor(np.expand_dims(img, axis=0),
 np.expand_dims(np.array(img.shape), axis=0))

We reshape the outputs of `rpn_boxes` and `all_scores` to remove the batch dimension.

In [None]:
rpn_boxes = rpn_boxes.reshape(-1, 4)
all_scores = all_scores.reshape(-1, cfg.DATA.NUM_CLASS)

First plot all rpn outputs. This is going to be a huge mess of boxes, mostly tagged as background, but worth looking at to determine how the model is working.

In [None]:

from viz import draw_predictions

rpn_viz = draw_predictions(img, rpn_boxes, all_scores)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(rpn_viz.astype(int))

Let's remove all the background boxes.

In [None]:
import numpy as np

no_bg = np.where(all_scores.argmax(axis=1)!=0)
rpn_no_bg_viz = draw_predictions(img, rpn_boxes[no_bg], all_scores[no_bg])
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(rpn_no_bg_viz.astype(int))

After the region proposal network, the model applies a nonmax supression that removes many of the redudant boxes, then produces the model's final output. Let's plot all those final output boxes.

In [None]:

from viz import draw_outputs
final_all_viz = draw_outputs(img, final_boxes, final_scores, final_labels, threshold=0.0)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(final_all_viz.astype(int))

Notice there is still some overlap, and a lot of extra boxes versus what we have on the ground truth. At this point, we want to pick a threshold for what boxes to show. Often this is set at .5 or .95. Let's try .5.

In [None]:
final_viz = draw_outputs(img, final_boxes, final_scores, final_labels, threshold=0.5)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(final_viz.astype(int))

This is starting to look more informative. Next, lets plot all the ground truth masks in the image.

the gt_mask function takes an image and a set of ground truth masks to overlay on the image.

In [None]:
from viz import gt_mask

mask_gt_viz = gt_mask(img, gt_masks)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(mask_gt_viz.astype(int))

And now plot our model's predictions for the masks, using the same .5 threshold for the boxes. We also now have a mask threshold. Within each box all pixels are given a probability of being part of the object. if the threshold is set at 0 the entire box will be the mask.

Note that the apply_masks function is a little more complicated than the gt_masks function. This is because apply_masks needs to take the Nx28x28 tensor of masks, as well as what boxes correspond to each mask. the final_scores are the scores for each box we used earlier. score_threshold is the same as when plotting the boxes, to avoid getting a lot of overlap. mask_threshold determines which pixels of the mask to overlay with the image.

In [None]:
from viz import apply_masks
masked_box_viz = apply_masks(img, final_boxes, masks, final_scores, score_threshold=.5, mask_threshold=0.0)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(masked_box_viz.astype(int))

If we increase the threshold, the masks will pull in tighter around the objects themselves.

In [None]:
masked_viz = apply_masks(img, final_boxes, masks, final_scores, score_threshold=.5, mask_threshold=0.5)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(masked_viz.astype(int))

Overlay masks and boxes from our model's predictions.

In [None]:
final_viz = draw_outputs(masked_viz, final_boxes, final_scores, final_labels, threshold=0.5)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(final_viz.astype(int))

Now let's try a totally new image. Set the URL to your test image in `images_path` below.

In [None]:
from skimage import io
images_path = # set the URL to your image
img = io.imread(images_path)

In [None]:
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(img.astype(int))

In [None]:
rpn_boxes, all_scores, final_boxes, final_scores, final_labels, masks = predictor(np.expand_dims(img, axis=0),
 np.expand_dims(np.array(img.shape), axis=0))

In [None]:
rpn_boxes = rpn_boxes.reshape(-1, 4)
all_scores = all_scores.reshape(-1, cfg.DATA.NUM_CLASS)

In [None]:
rpn_viz = draw_predictions(img, rpn_boxes, all_scores)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(rpn_viz.astype(int))

In [None]:
no_bg = np.where(all_scores.argmax(axis=1)!=0)
rpn_no_bg_viz = draw_predictions(img, rpn_boxes[no_bg], all_scores[no_bg])
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(rpn_no_bg_viz.astype(int))

In [None]:
final_all_viz = draw_outputs(img, final_boxes, final_scores, final_labels, threshold=0.0)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(final_all_viz.astype(int))

In [None]:
final_viz = draw_outputs(img, final_boxes, final_scores, final_labels, threshold=0.75)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(final_viz.astype(int))

In [None]:
masked_image = apply_masks(img, final_boxes, masks, final_scores, score_threshold=.8, mask_threshold=0.5)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(masked_image.astype(int))

In [None]:
final_viz = draw_outputs(masked_image, final_boxes, final_scores, final_labels, threshold=0.5)
fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))
ax.imshow(final_viz.astype(int))