{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Mask R-CNN Visualization\n", "\n", "This notebook is used to visualize a trained Mask R-CNN model. First, we set up the system path for loading python packages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os \n", "import sys\n", "path = os.path.dirname(os.getcwd())\n", "sys.path.append(path)\n", "sys.path.append(os.path.join(path, 'MaskRCNN'))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In the next cell, we configure the data directory, and finalize the model. We expect the pretrained ResNet-50 backbone weights, [`ImageNet-R50-AlignPadding.npz`](http://models.tensorpack.com/FasterRCNN/ImageNet-R50-AlignPadding.npz) to be avilable under the `pretrained-models` folder of your data directory. Set `cfg.DATA.BASEDIR` to the root of your data directory, and `model_output_dir` to the root of your model output directory containing sub-folders `train_log/maskrcnn`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# create a mask r-cnn model\n", "from model.generalized_rcnn import ResNetFPNModel\n", "\n", "from config import finalize_configs, config as cfg\n", "from dataset import DetectionDataset\n", "from data import get_viz_dataflow\n", "\n", "MODEL = ResNetFPNModel()\n", "cfg.MODE_FPN = True\n", "cfg.MODE_MASK = True\n", "model_output_dir = \"/logs/maskrcnn-optimized\" # set training model output directory root\n", "cfg.DATA.BASEDIR = \"/data\" # set data directory root\n", "\n", "# file path to previoulsy trained mask r-cnn model\n", "trained_model = f'{model_output_dir}/train_log/maskrcnn/model-45000.index'\n", "\n", "# fixed resnet50 backbone weights\n", "cfg.BACKBONE.WEIGHTS = f'{cfg.DATA.BASEDIR}/pretrained-models/ImageNet-R50-AlignPadding.npz'\n", "\n", "# dataset location\n", "# calling detection dataset gets the number of coco categories \n", "# and saves in the configuration\n", "DetectionDataset()\n", "finalize_configs(is_training=False)\n", "df = get_viz_dataflow('val2017')\n", "df.reset_state()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In the next cell, we create the predictor for inference with the trained model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from tensorpack.predict.base import OfflinePredictor\n", "from tensorpack.tfutils.sessinit import get_model_loader\n", "from tensorpack.predict.config import PredictConfig\n", "\n", "# Create an inference predictor \n", "predictor = OfflinePredictor(PredictConfig(\n", " model=MODEL,\n", " session_init=get_model_loader(trained_model),\n", " input_names=['images', 'orig_image_dims'],\n", " output_names=[\n", " 'generate_fpn_proposals/boxes',\n", " 'fastrcnn_all_scores',\n", " 'output/boxes',\n", " 'output/scores',\n", " 'output/labels',\n", " 'output/masks'\n", " ]))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "`df` is a generator that will produce images and annotations. Images are loaded in BGR format, so need to be flipped to RGB." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "an_image = next(df.get_data())\n", "# get image\n", "img = an_image['images']\n", "# flip image channels to convert BGR to RGB\n", "img = img[:,:,[2,1,0]]\n", "# get ground truth bounding boxes\n", "gt_boxes = an_image['gt_boxes']\n", "# get ground truth labels\n", "gt_labels = an_image['gt_labels']\n", "# get ground truth image mask\n", "gt_masks = an_image['gt_masks']" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Display the image by itself." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(img.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Add in ground truth bounding boxes and labels using the draw_annotation function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from viz import draw_annotation\n", "\n", "\n", "gt_viz = draw_annotation(img, gt_boxes, gt_labels)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(gt_viz.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Pass this image to the model. Get region proposal network outputs and final outputs. The pred function takes as input the original image, (with an additional batch dimension added, because our model expects batches), and the shape of the original image. It returns:\n", "\n", "* `rpn_boxes`: a 1000 x 4 size matrix specifying segmented regions of the image.\n", "\n", "* `all_scores`: 1000 x cfg.DATA.NUM_CLASS matrix the probability of each category for each region proposal (includes 1 for background).\n", "\n", "* `final_boxes`: N x 4 matrix the final set of region boxes after applying non-max supression.\n", "\n", "* `final_scores`: length N vector of the objectness of the final boxes\n", "\n", "* `final_labels`: N x cfg.DATA.NUM_CLASS matrix the probability of each category for final boxes (includes 1 for background).\n", "\n", "* `masks`: N x 28 x 28 tensor containing masks for each final box. Note that these need to be scaled to each box size to \n", " apply to the original image" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "rpn_boxes, all_scores, final_boxes, final_scores, final_labels, masks = predictor(np.expand_dims(img, axis=0),\n", " np.expand_dims(np.array(img.shape), axis=0))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We reshape the outputs of `rpn_boxes` and `all_scores` to remove the batch dimension." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rpn_boxes = rpn_boxes.reshape(-1, 4)\n", "all_scores = all_scores.reshape(-1, cfg.DATA.NUM_CLASS)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "First plot all rpn outputs. This is going to be a huge mess of boxes, mostly tagged as background, but worth looking at to determine how the model is working." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "from viz import draw_predictions\n", "\n", "rpn_viz = draw_predictions(img, rpn_boxes, all_scores)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(rpn_viz.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's remove all the background boxes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "no_bg = np.where(all_scores.argmax(axis=1)!=0)\n", "rpn_no_bg_viz = draw_predictions(img, rpn_boxes[no_bg], all_scores[no_bg])\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(rpn_no_bg_viz.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "After the region proposal network, the model applies a nonmax supression that removes many of the redudant boxes, then produces the model's final output. Let's plot all those final output boxes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "from viz import draw_outputs\n", "final_all_viz = draw_outputs(img, final_boxes, final_scores, final_labels, threshold=0.0)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(final_all_viz.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Notice there is still some overlap, and a lot of extra boxes versus what we have on the ground truth. At this point, we want to pick a threshold for what boxes to show. Often this is set at .5 or .95. Let's try .5." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "final_viz = draw_outputs(img, final_boxes, final_scores, final_labels, threshold=0.5)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(final_viz.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This is starting to look more informative. Next, lets plot all the ground truth masks in the image.\n", "\n", "the gt_mask function takes an image and a set of ground truth masks to overlay on the image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from viz import gt_mask\n", "\n", "mask_gt_viz = gt_mask(img, gt_masks)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(mask_gt_viz.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "And now plot our model's predictions for the masks, using the same .5 threshold for the boxes. We also now have a mask threshold. Within each box all pixels are given a probability of being part of the object. if the threshold is set at 0 the entire box will be the mask.\n", "\n", "Note that the apply_masks function is a little more complicated than the gt_masks function. This is because apply_masks needs to take the Nx28x28 tensor of masks, as well as what boxes correspond to each mask. the final_scores are the scores for each box we used earlier. score_threshold is the same as when plotting the boxes, to avoid getting a lot of overlap. mask_threshold determines which pixels of the mask to overlay with the image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from viz import apply_masks\n", "masked_box_viz = apply_masks(img, final_boxes, masks, final_scores, score_threshold=.5, mask_threshold=0.0)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(masked_box_viz.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "If we increase the threshold, the masks will pull in tighter around the objects themselves." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "masked_viz = apply_masks(img, final_boxes, masks, final_scores, score_threshold=.5, mask_threshold=0.5)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(masked_viz.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Overlay masks and boxes from our model's predictions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "final_viz = draw_outputs(masked_viz, final_boxes, final_scores, final_labels, threshold=0.5)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(final_viz.astype(int))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now let's try a totally new image. Set the URL to your test image in `images_path` below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from skimage import io\n", "images_path = # set the URL to your image\n", "img = io.imread(images_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(img.astype(int))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rpn_boxes, all_scores, final_boxes, final_scores, final_labels, masks = predictor(np.expand_dims(img, axis=0),\n", " np.expand_dims(np.array(img.shape), axis=0))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rpn_boxes = rpn_boxes.reshape(-1, 4)\n", "all_scores = all_scores.reshape(-1, cfg.DATA.NUM_CLASS)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rpn_viz = draw_predictions(img, rpn_boxes, all_scores)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(rpn_viz.astype(int))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "no_bg = np.where(all_scores.argmax(axis=1)!=0)\n", "rpn_no_bg_viz = draw_predictions(img, rpn_boxes[no_bg], all_scores[no_bg])\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(rpn_no_bg_viz.astype(int))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "final_all_viz = draw_outputs(img, final_boxes, final_scores, final_labels, threshold=0.0)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(final_all_viz.astype(int))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "final_viz = draw_outputs(img, final_boxes, final_scores, final_labels, threshold=0.75)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(final_viz.astype(int))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "masked_image = apply_masks(img, final_boxes, masks, final_scores, score_threshold=.8, mask_threshold=0.5)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(masked_image.astype(int))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "final_viz = draw_outputs(masked_image, final_boxes, final_scores, final_labels, threshold=0.5)\n", "fig,ax = plt.subplots(figsize=(img.shape[1]//50, img.shape[0]//50))\n", "ax.imshow(final_viz.astype(int))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 2 }