# Video Input Processing

## Introduction

Amazon Chime SDK for JavaScript contains easy-to-use APIs for adding frame-by-frame processing to an outgoing video stream.

Amazon Chime SDK for JavaScript defines a video processing stage as an implementation of the `VideoFrameProcessor` interface, which takes an array of `VideoFrameBuffer`s, applies builder-defined processing, and outputs an array of `VideoFrameBuffer`s. The outputs of each processor can be linked to the inputs of the next processor, with the last processor in the chain required to implement `asCanvasImageSource` to return `CanvasImageSource` so that the resulting frames can be rendered onto a [HTMLCanvasElement](https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement) and transformed into a [MediaStream](https://developer.mozilla.org/en-US/docs/Web/API/MediaStream).

To integrate video processing into meeting session, `VideoTransformDevice` should be used, which internally uses a `VideoFrameProcessorPipeline` to complete the aforementioned linking of stages and final canvas rendering.

A typical workflow would be:

1. Create an array of custom `VideoFrameProcessor`s.
2. Create a `VideoTransformDevice` from a `Device` and the array of `VideoFrameProcessor`s.
3. Call `meetingSession.audioVideo.startVideoInput` with the `VideoTransformDevice`.

### Browser compatibility

The APIs for video processing in Amazon Chime SDK for JavaScript work in Firefox, Chrome, Chromium-based browsers 
(including Electron) on desktop, Android and iOS operating systems. A full compatibility table is below.

|Browser                                                                |Minimum supported version  
|---                                                                    |---
|Firefox                                                                |76
|Chromium-based browsers and environments, including Edge and Electron  |78
|Android Chrome                                                         |78
|Safari on MacOS                                                        |13.0
|iOS Safari                                                             |16
|iOS Chrome                                                             |16
|iOS Firefox (Except on iPad)                                           |16

Note that there is a known issue with `VideoFrameProcessor` in Safari 15: see [github issue 1059](https://github.com/aws/amazon-chime-sdk-js/issues/1059). This has been fixed with Safari 16.

## Video Processing APIs

### VideoTransformDevice

`VideoTransformDevice` allows `VideoFrameProcessor`s to be applied to to a `Device` and provide a new object which can be passed into `meetingSession.audioVideo.startVideoInput`.

`DefaultVideoTransformDevice` is the provided implementation of `VideoTransformDevice`. It takes the aforementioned `Device` and array of `VideoFrameProcessor`s, then uses `VideoFrameProcessorPipeline` under the hood and hides its complexity.

#### Construction and Starting Video Processing

The construction of the `DefaultVideoTransformDevice` will not start the camera or start processing. The method `meetingSession.audioVideo.startVideoInput` should be called just like for normal devices. The device controller will use the inner `Device` to acquire the source `MediaStream` and start the processing pipeline at the same frame rate. "Inner device" in this context refers to the original video stream coming from the selected camera.

The parameters to `chooseVideoInputQuality` are used as constraints on the source `MediaStream`. After the video input is chosen, `meetingSession.audioVideo.startLocalVideoTile` can be called to start streaming video.

```javascript
import {
  DefaultVideoTransformDevice
} from 'amazon-chime-sdk-js';

const stages = [new VideoResizeProcessor(4/3)]; // constructs  processor

const transformDevice = new DefaultVideoTransformDevice(
  logger,
  'foo', // device id string
  stages
);

await meetingSession.audioVideo.startVideoInput(transformDevice);
meetingSession.audioVideo.startLocalVideo();

```

#### Switching the Inner Device on VideoTransformDevice

To switch the inner `Device` on `DefaultVideoTransformDevice`, call `DefaultVideoTransformDevice.chooseNewInnerDevice` with a new `Device`.
`DefaultVideoTransformDevice.chooseNewInnerDevice` returns a new `DefaultVideoTransformDevice` but preserves the state of `VideoFrameProcessor`s. Then call `meetingSession.audioVideo.startVideoInput` with the new transform device.

```javascript
const newInnerDevice = 'bar';
if (transformDevice.getInnerDevice() !== innerDevice) {
  transformDevice = transformDevice.chooseNewInnerDevice(innerDevice);
}
```

#### Stopping VideoTransformDevice

To stop video processing for the chosen `DefaultVideoTransformDevice`, call `meetingSession.audioVideo.startVideoInput` with a different `Device` (possibly another `DefaultVideoTransformDevice`) or call `meetingSession.audioVideo.stopVideoInput` to stop using previous `DefaultVideoTransformDevice`.

After stopping the video processing, the inner `Device` will be released by device controller unless the inner `Device` is a `MediaStream` provided by users where it is their responsibility of users to handle the lifecycle.

After `DefaultVideoTransformDevice` is no longer used by device controller, call `DefaultVideoTransformDevice.stop` to release the `VideoProcessor`s and underlying pipeline. After `stop` is called, users must discard the `DefaultVideoTransformDevice` as it will not be reusable.`DefaultVideoTransformDevice.stop` is necessary to release the internal resources.

```javascript
await meetingSession.audioVideo.stopVideoInput();
transformDevice.stop();
```

Applications will need to stop and replace `DefaultVideoTransformDevice` when they want to change video processors or change the video input quality.

#### Receiving lifecycle notifications with an observer

To receive notifications of lifecycle events, a `DefaultVideoTransformDeviceObserver` can be added to the `DefaultVideoTransformDevice` and handlers added for the following:

| Observer                   | Description  |
|----------------------------|--------------|
| `processingDidStart`       | Called when video processing starts. |
| `processingDidFailToStart` | Called when video processing could not start due to runtime errors. In this case, developers are expected to call `startVideoInput` again with a valid `VideoInputDevice` to continue video sending. |
| `processingDidStop`        | Called when video processing is stopped **expectedly**. |
| `processingDidFailToStart` | Called when the execution of processors slows the frame rate down by at least half.|

### VideoFrameBuffer

`VideoFrameBuffer` is an abstract interface that can be implemented to represent images or video sources. It is required to implement `asCanvasImageSource` to return `CanvasImageSource`; optionally, developers can implement `asCanvasElement` or `asTransferable` to facilitate processing algorithm to work with [HTMLCanvasElement](https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement)s or [Worker](https://developer.mozilla.org/en-US/docs/Web/API/Worker/Worker)s respectively.

### VideoFrameProcessor

`VideoFrameProcessor` represents a processing stage. Internally,  processors are executed in a completely serial manner. Each pass will finish before the next pass begins. The input `VideoFrameBuffer`s are the video sources. Changing the property of buffers such as resizing will likely modify properties of the video sources and should be performed with care.

### Building a simple processor

The following example shows how to build a basic processor to resize the video frames.  We first define an implementation of `VideoFrameProcessor`:

```javascript
class VideoResizeProcessor implements VideoFrameProcessor { 
  constructor(private displayAspectRatio) {}

  async process(buffers: VideoFrameBuffer[]): VideoFrameBuffer[];
  async destroy(): Promise<void>;
}
```

To keep the properties of the original video, the processor has to copy the frame onto its own staging buffer in `process`:

```typescript
class VideoResizeProcessor implements VideoFrameProcessor { 
  private targetCanvas: HTMLCanvasElement = document.createElement('canvas') as HTMLCanvasElement;
  private targetCanvasCtx: CanvasRenderingContext2D = this.targetCanvas.getContext('2d') as CanvasRenderingContext2D;
  private canvasVideoFrameBuffer = new CanvasVideoFrameBuffer(this.targetCanvas);

  private renderWidth: number = 0;
  private renderHeight: number = 0;
  private sourceWidth: number = 0;
  private sourceHeight: number = 0;
    
  async process(buffers: VideoFrameBuffer[]): Promise<VideoFrameBuffer[]>;
}
```

During processing, the incoming video is painted onto the internal canvas like in the following abbreviated:

```typescript
async process(buffers: VideoFrameBuffer[]): Promise<VideoFrameBuffer[]> {
  const canvas = buffers[0].asCanvasElement();
  const frameWidth = canvas.width;
  const frameHeight = canvas.height;

  // error handling to skip resizing
  if (frameWidth === 0 || frameHeight === 0) {
    return buffers;
  }

  // re-calculate the cropped width and height
  .....

  // copy the frame to the intermediate canvas
  this.targetCanvasCtx.drawImage(canvas, this.dx, 0, this.renderWidth, this.renderHeight,
    0, 0, this.renderWidth, this.renderHeight);

  // replace the video frame with the resized one for subsequent processor
  buffers[0] = this.canvasVideoFrameBuffer;
  return buffers;
}
```

### Building an overlay processor

An overlay processor can be a customized processor for loading an external image. Note that this example accounts for the usage of [Cross-Origin Resource Sharing (CORS)](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS):

```typescript
class VideoLoadImageProcessor implements VideoFrameProcessor { 
  // Create a HTMLCanvasElement
  private targetCanvas: HTMLCanvasElement = document.createElement('canvas') as HTMLCanvasElement;
  // Create a HTMLImageElement
  private image = document.createElement("img") as HTMLImageElement;

  // Load the image from source
  loadImage("https://someurl.any/page/bg.jpg", image);

  private targetCanvasCtx: CanvasRenderingContext2D = this.targetCanvas.getContext('2d') as CanvasRenderingContext2D;

  // Render the image on the canvas
  this.targetCanvasCtx.drawImage(image, image.width, image.height);

  private canvasVideoFrameBuffer = new CanvasVideoFrameBuffer(this.targetCanvas);

  // Function to load an image from an external source (absolute URL) and configure CORS to make sure the image is successfully loaded
  async function loadImage(url: string, elem: HTMLImageElement): Promise<HTMLImageElement> {
    return new Promise((resolve, reject) => {
      elem.onload = (): void => resolve(elem);
      elem.onerror = reject;
      elem.src = url;
      // to configure CORS access for the fetch of the new image if it is not hosted on the same server
      elem.crossOrigin = "anonymous";
    });
  }

  async process(buffers: VideoFrameBuffer[]): Promise<VideoFrameBuffer[]> {
    const canvas = buffers[0].asCanvasElement();
    // copy the frame to the intermediate canvas
    this.targetCanvasCtx.drawImage(canvas, 0, 0));

    // replace the video frame with the external image one for subsequent processor
    buffers[0] = this.canvasVideoFrameBuffer;
    return buffers;
  }
}
```

## Additional Video Processing Use-Cases

### Custom processor usage during meeting preview

Local video post processing can be previewed before transmitting to remote clients just for a normal device.

```javascript
import {
  DefaultVideoTransformDevice
} from 'amazon-chime-sdk-js';

const stages = [new VideoResizeProcessor(4/3)]; // constructs  processor
const videoElement = document.getElementById('video-preview');
const transformDevice = new DefaultVideoTransformDevice(
  logger,
  'foobar', // device id string
  stages
);

await meetingSession.audioVideo.startVideoInput(transformDevice);
meetingSession.audioVideo.startVideoPreviewForVideoInput(videoElement);
```

### Custom video processor usage for content share

The API `ContentShareControllerFacade.startContentShare` does not currently support passing in a `VideoTransformDevice` or similar. But the `DefaultVideoTransformDevice` makes it straight forward to apply transforms on a given `MediaStream`, and output a new `MediaStream`.

Note that for screen share usage we use [MediaDevices.getDisplayMedia](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getDisplayMedia) directly rather then the helper function `ContentShareControllerFacade.startContentShareFromScreenCapture`.

```javascript
import {
  DefaultVideoTransformDevice
} from 'amazon-chime-sdk-js';

mediaStream = navigator.mediaDevices.getDisplayMedia({
  audio: true,
  video: true
});

const stages = [new CircularCut()]; // constructs some custom processor
const transformDevice = new DefaultVideoTransformDevice(
  logger,
  undefined, // Not needed when using transform directly
  stages
);

await meetingSession.audioVideo.startContentShare(await transformDevice.transformStream(mediaStream));

// On completion
transformDevice.stop();
```

The `MediaStream` can also be from a file input or other source.