## Amazon SageMakerCV Cuda Utilities

Amazon SageMakerCV Cuda Utilities (SMCV-Utils) provides a set of highly optimized PyTorch functions for training computer vision models on Nvidia GPUs. Within computer vision, there are a set of common operations which are difficult to optimize on GPUs. Namely, these include:

- Anchor Generation
- Non-Max Suppression
- ROI Align
- Bounding Box IOU
- Proposal Matching
- Channel Last Convolution Operations

SMCV-Utils provides implementations of these operations directly in CUDA, meaning they run significantly faster than Python. SMCV-Utils is fully integrated with the Amazon PyTorch Deeep Learning container (DLC) and works with Amazon SageMakerCV. SMCV-Utils is based on similar specialized Cuda functions written by Facebook and Nvidia for the Detectron2 model and MLPerf, respectively, but disaggregated from their larger models, such that they are more extensible to new and custom deep learning computer vision models.

## Documentation


### Anchor Generation

The anchor generator takes an image, feature map sizes, anchor sizes, and strides, and returns a set of level anchors and indicators if they are valid for the image.

```
from smcv_utils import _C

anchors, inds_inside = _C.anchor_generator(image_height, 
                                           image_width 
                                           feature_size, # feature map dimensons list of 2 elements
                                           base_anchors, # tensor of size Nx4 specifying sizes of anchors at each level
                                           stride, # anchor strides along image
                                           straddle_thresh # stride starting position
                                           )
```


### Non-Max Suppression

Non-max suppression takes a series of anchors, and reduces overlaps by selection the anchors with the highest probability of containing an image.

```
from smcv_utils import _C

proposals, objectness, keep = _C.GeneratePreNMSUprightBoxes(
                                    N, # batch size
                                    A, # anchors per location
                                    H, # feature map height
                                    W, # feature map width
                                    topk_idx, # indices of top anchors
                                    objectness, # objectness output from RPN of size anchors
                                    box_regression_topk, # regression outputs of RPN 4xanchors
                                    anchors, # image base anchors
                                    image_shapes, # list batch size x 2
                                    nms_top_n, # number of regions to keep after nms
                                    min_size, # minimum regions
                                    bbox_xform_clip, # region clipping to feature maps
                                    return_indices # return keep indices
                                    )
```


### ROI Align

ROI align extracts sections from feature maps based on regions of interest.

```
extracted_features = _C.roi_align_forward(features, # single feature map (bs x width x height x channels) or (bs x channels x width x height)
                                          rois, # regions of interest (5 x num_rois) first column specifies element in batch
                                          spatial_scale, # size of feature map relative to image
                                          *output_size, # output feature map size
                                          sampling_ratio, # number of samples to take for bilinear interpolation
                                          NHWC # is data in channel last format
                                          )
```

### IOU

IOU computes the intersection over union of anchors versus targets

```
iou = _C.box_iou(torch.cat(rois, # (bs, 4, rois)
                           targets # (bs, 4, max number of boxes in batch)
                           )
```

### Proposal Matching

Matches ROIs to targets based on greatest IOU

```
matches = _C.match_proposals(iou, 
                             match_low_quality, # ensure at least one match per target (bool) 
                             low_overlap, # anchors below this are negative matches # float 
                             high_overlap # anchors above this are positive matches
                             )
```
### Channel Last Convolution

Channel last convolutions, along with max pooling and batch normalization, can be fastor on Nvidia GPUs than channel first, because of the order of memory accesses. These follow the same format as regular PyTorch convolutions, but expect data in a (bs, h, w, c) format.

```
import torch
from smcv_utils import NHWC

output =  NHWC.cudnn_convolution_nhwc(x, padded_w,
                                      padding, stride, dilation,
                                      groups,
                                      torch.backends.cudnn.benchmark, torch.backends.cudnn.deterministic)


```


## License

This project is licensed under the Apache-2.0 License.