# Codebase Details

For training, the codebase is broken into a few major parts:

* `train.py` handles parsing the command line arguments, loading the model, and configuring and launching the training.
* `config.py` holds various configurations needed by the model.
* `dataset.py` handles a lot of COCO specific logic such as parsing annotations and computing evaluation metrics from a set of predictions.
* `common.py`, `model_box.py`, `performance.py`, `viz.py` and `utils/` hold various utilities, mostly for either debugging or manipulating bounding boxes, anchors and segmentation masks.

The core model has two components - the Tensorflow graph and the non-Tensorflow data pipelines implemented using Tensorpack DataFlows and numpy.

* Code for the Tensorflow graph is in `model/`.
    - `model/generalized_rcnn.py` has a `DetectionModel` class whose `build_graph()` method outlines a generic two stage detector model.
    - `ResNetFPNModel` subclasses that and provides implementations of the backbone, rpn and roi_heads. `ResNetFPNModel.inputs()` uses `tfv1.placeholders` to define the input it expects from the dataflow
* `data.py` holds the data pipeline logic. The key function for training is `get_batch_train_dataflow()`, which contains logic for reading data from disk and batching datapoints together.

## General Optimizations

### Adding batch dimension 

We added the ability have a per-GPU batch size greater than 1. Below are some implementation details of what that means:

* In the data pipeline, we took the existing outputs (image, gt_label, gt_boxes, gt_masks) and added a batch dimension, padding when different images have different shapes. To account for this padding in the model, we added two new inputs, `orig_image_dims` and `orig_gt_counts` which are used to slice away padding when necessary.
* The model only supports the [Feature Pyramid Network (FPN)](https://arxiv.org/pdf/1612.03144.pdf) option for the ResNet backbone. We do  not support Cascade RCNN.
* Region Proposal Network (RPN) proposals for each FPN level are generated with batch dimension.
* ROI Align features are computed with batch dimension.

## AWS specific optimizations

### Using EFA for faster network communication

Elastic Fabric Adapter (EFA) is a network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS. EFA works together with MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library) make all cross node communication faster.

Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html