======================= Media Query Application ======================= .. youtube:: UCZXJpI1dKw :width: 75% This sample application shows how to combine multiple event handlers in Chalice to create an image processing pipeline. It takes as input any image or video and it will identify objects, people, text, scenes, and activities. This results of this analysis can then be queried with a REST API. .. image:: docs/assets/appexample.jpg :width: 100% :alt: Application Example There are several components of this application. The first part is an image processing pipeline. The application is registered to automatically process any media that's uploaded to an Amazon S3 bucket. The application will then use Amazon Rekognition to automatically detect labels in either the image or the video. The returned labels are then stored in an Amazon DynamoDB table. For videos, an asynchronous job is started. This is because the analysis for videos takes longer than analyzing images so we don't want our Lambda function to block until the job is complete. To handle this asynchronous job, we subscribe to an Amazon SNS topic. When the asynchronous job is finished analyzing our uploaded video, an event handler is called that will retrieve the results and store the labels in Amazon DynamoDB. The final component is the REST API. This allows users to query for labels associated with the media that has been uploaded. You can find the full source code for this application in our `samples directory on GitHub `__. :: $ git clone git://github.com/aws/chalice $ cd chalice/docs/source/samples/media-query/code We'll now walk through the architecture of the application, how to deploy and use the application, and go over the application code. .. note:: This sample application is also available as a `workshop `__. The main difference between the sample apps here and the Chalice workshops is that the workshop is a detailed step by step process for how to create this application from scratch. You build the app by gradually adding each feature piece by piece. It takes several hours to work through all the workshop material. In this document we review the architecture, the deployment process, then walk through the main sections of the code. Architecture ============ Below is the architecture for the application. .. image:: docs/assets/architecture.png :width: 100% :alt: Architecture diagram The main components of the application are as follows: * ``handle_object_created``: A Lambda function that is triggered when an object is uploaded to a S3 bucket. If the object is an image, it will call Amazon Rekognition's ``DetectLabels`` API to detect objects in the image. With the detected objects, the Lambda function will then add the object to an Amazon DynamoDB table. If the object is a video, it will call Rekognition's ``StartLabelDetection`` API to initiate an asynchronous job to detect labels in the video. When the job is completed, a completion notification is pushed to an SNS topic. * ``handle_object_deleted``: A Lambda function that removes the object from the DynamoDB table if the object is deleted from the S3 bucket. * ``add_video_labels``: A Lambda function that is triggered on video label detection SNS messages. On invocation, it will call Rekognition's ``GetLabelDetection`` API to retrieve all detected objects from the video. It then adds the video with its labels to the DynamoDB Table * ``api_handler``: A Lambda function that is invoked by HTTP requests to Amazon API Gateway. On invocation, it will query the database based on the received HTTP request and return the results to the user through API Gateway. Deployment ========== First, we'll setup our development environment by cloning the Chalice GitHub repository and copying the sample code in a new directory:: $ git clone git://github.com/aws/chalice $ mkdir /tmp/demo $ cp -r chalice/docs/source/samples/media-query/code/ /tmp/demo/media-query $ cd /tmp/demo/media-query/ Next configure a virtual environment that uses Python 3. In this example we're using Python 3.7. :: $ python3 -m venv /tmp/venv37 $ . /tmp/venv37/bin/activate To deploy the application, first install the necessary requirements and install Chalice:: $ pip install -r requirements.txt $ pip install chalice We'll also be using the AWS CLI to help deploy our application, you can follow the `installation instructions `__ if you don't have the AWS CLI installed. Next, we'll use the AWS CLI to deploy a CloudFormation stack containing the S3 bucket, DynamoDB table, and SNS topic needed to run this application:: $ aws cloudformation deploy --template-file resources.json \ --stack-name media-query --capabilities CAPABILITY_IAM Record the deployed resources as environment variables in the Chalice application by running the `recordresources.py` script:: $ python recordresources.py --stack-name media-query You can see these values by looking at the ``.chalice/config.json`` file. Once those resources are created and recorded, deploy the Chalice application:: $ chalice deploy Using the Application ===================== Once the application is deployed, use the AWS CLI to fetch the name of the bucket that is storing the media files:: $ aws cloudformation describe-stacks --stack-name media-query \ --query "Stacks[0].Outputs[?OutputKey=='MediaBucketName'].OutputValue" \ --output text media-query-mediabucket-xtrhd3c4b59 Upload some sample media files to your Amazon S3 bucket so the system populates information about the media files in your DynamoDB table. If you need sample media files, you can use the included samples from the corresponding `Chalice workshop `__ assets `here `__. :: $ aws s3 cp assets/sample.jpg s3://media-query-mediabucket-xtrhd3c4b59/sample.jpg $ aws s3 cp assets/sample.mp4 s3://media-query-mediabucket-xtrhd3c4b59/sample.mp4 Wait about a minute for the media files to be populated in the database and then install HTTPie:: $ pip install httpie Then, list out all if the media files using the application's API with HTTPie:: $ chalice url https://qi5hf4djdg.execute-api.us-west-2.amazonaws.com/api/ $ http https://qi5hf4djdg.execute-api.us-west-2.amazonaws.com/api/ HTTP/1.1 200 OK Connection: keep-alive Content-Length: 279 Content-Type: application/json Date: Tue, 10 Jul 2018 17:58:40 GMT Via: 1.1 fa751ee53e2bf18781ae98b293ff9375.cloudfront.net (CloudFront) X-Amz-Cf-Id: sNnrzvbdvgj1ZraySJvfSUbHthC_fok8l5GJ7glV4QcED_M1c8tlvg== X-Amzn-Trace-Id: Root=1-5b44f3d0-4546157e8f5e35a008d06d88;Sampled=0 X-Cache: Miss from cloudfront x-amz-apigw-id: J0sIlHs3vHcFj9g= x-amzn-RequestId: e0aaf4e1-846a-11e8-b756-99d52d342d60 [ { "labels": [ "Animal", "Canine", "Dog", "German Shepherd", "Mammal", "Pet", "Collie" ], "name": "sample.jpg", "type": "image" }, { "labels": [ "Human", "Clothing", "Dog", "Nest", "Person", "Footwear", "Bird Nest", "People", "Animal", "Husky" ], "name": "sample.mp4", "type": "video" } ] You can include query string parameters as well to query all objects based on what the file name starts with, the type of the media file, and the detected objects in the media file:: $ http https://qi5hf4djdg.execute-api.us-west-2.amazonaws.com/api/ startswith==sample.m HTTP/1.1 200 OK Connection: keep-alive Content-Length: 153 Content-Type: application/json Date: Tue, 10 Jul 2018 19:20:02 GMT Via: 1.1 aa42484f82c16d99015c599631def20c.cloudfront.net (CloudFront) X-Amz-Cf-Id: euqlOlWN5k5V_zKCJy4SL988Vcje6W5jDR88GrWr5uYGH-_ZvN4arg== X-Amzn-Trace-Id: Root=1-5b4506e0-db041a3492ee56e8f3d9457c;Sampled=0 X-Cache: Miss from cloudfront x-amz-apigw-id: J04DHE92PHcF--Q= x-amzn-RequestId: 3d82319d-8476-11e8-86d9-a1e4585e5c26 [ { "labels": [ "Human", "Clothing", "Dog", "Nest", "Person", "Footwear", "Bird Nest", "People", "Animal", "Husky" ], "name": "sample.mp4", "type": "video" } ] $ http https://qi5hf4djdg.execute-api.us-west-2.amazonaws.com/api/ media-type==image HTTP/1.1 200 OK Connection: keep-alive Content-Length: 126 Content-Type: application/json Date: Tue, 10 Jul 2018 19:20:53 GMT Via: 1.1 88eb066576c1b47cd896ab0019b9f25f.cloudfront.net (CloudFront) X-Amz-Cf-Id: rwuOwzLKDM4KgcSBXFihWeNNsYSpZDYVpc8IXdT0xOu8qz8aA2Pj3w== X-Amzn-Trace-Id: Root=1-5b450715-de71cf04ca2900b839ff1194;Sampled=0 X-Cache: Miss from cloudfront x-amz-apigw-id: J04LaE6YPHcF3VA= x-amzn-RequestId: 5d29d59a-8476-11e8-a347-ebb5d5f47789 [ { "labels": [ "Animal", "Canine", "Dog", "German Shepherd", "Mammal", "Pet", "Collie" ], "name": "sample.jpg", "type": "image" } ] $ http https://qi5hf4djdg.execute-api.us-west-2.amazonaws.com/api/ label==Person HTTP/1.1 200 OK Connection: keep-alive Content-Length: 153 Content-Type: application/json Date: Tue, 10 Jul 2018 19:20:02 GMT Via: 1.1 aa42484f82c16d99015c599631def20c.cloudfront.net (CloudFront) X-Amz-Cf-Id: euqlOlWN5k5V_zKCJy4SL988Vcje6W5jDR88GrWr5uYGH-_ZvN4arg== X-Amzn-Trace-Id: Root=1-5b4506e0-db041a3492ee56e8f3d9457c;Sampled=0 X-Cache: Miss from cloudfront x-amz-apigw-id: J04DHE92PHcF--Q= x-amzn-RequestId: 3d82319d-8476-11e8-86d9-a1e4585e5c26 [ { "labels": [ "Human", "Clothing", "Dog", "Nest", "Person", "Footwear", "Bird Nest", "People", "Animal", "Husky" ], "name": "sample.mp4", "type": "video" } ] You can also query for a specific object:: $ http https://qi5hf4djdg.execute-api.us-west-2.amazonaws.com/api/sample.jpg HTTP/1.1 200 OK Connection: keep-alive Content-Length: 126 Content-Type: application/json Date: Tue, 10 Jul 2018 19:20:53 GMT Via: 1.1 88eb066576c1b47cd896ab0019b9f25f.cloudfront.net (CloudFront) X-Amz-Cf-Id: rwuOwzLKDM4KgcSBXFihWeNNsYSpZDYVpc8IXdT0xOu8qz8aA2Pj3w== X-Amzn-Trace-Id: Root=1-5b450715-de71cf04ca2900b839ff1194;Sampled=0 X-Cache: Miss from cloudfront x-amz-apigw-id: J04LaE6YPHcF3VA= x-amzn-RequestId: 5d29d59a-8476-11e8-a347-ebb5d5f47789 [ { "labels": [ "Animal", "Canine", "Dog", "German Shepherd", "Mammal", "Pet", "Collie" ], "name": "sample.jpg", "type": "image" } ] Code Walkthrough ================ We'll take a top-down approach with this application and start with the main entry point, the ``app.py`` file. The source code for this application is split between the ``app.py`` as well as supporting code in ``chalicelib/``. Event Handlers -------------- In the ``app.py`` we see four different decorator types, each corresponding to Lambda functions that are triggered by different events. Note that the line numbers correspond to the line numbers in the ``app.py`` file. .. literalinclude:: code/app.py :lineno-match: :start-after: # Start of Event Handlers :end-before: # End of Event Handlers The first two decorators use ``@app.on_s3_event`` and are specifying that these two Lambda functions should be invoked when an object is created or deleted from S3, respectively. The name of the S3 bucket is not hardcoded in the ``app.py`` file but instead pulled from the environment variable ``MEDIA_BUCKET_NAME``. The ``recordresources.py`` script that was run as part of the deployment process described above automatically created these resources and updated the Chalice config file (``.chalice/config.json``) with these values. If you look at the contents of your ``.chalice/config.json`` file, it should look something like this: .. code-block:: json { "version": "2.0", "app_name": "media-query", "stages": { "dev": { "api_gateway_stage": "api", "autogen_policy": false, "environment_variables": { "MEDIA_TABLE_NAME": "media-query-MediaTable-10QEPR0O8DOT4", "MEDIA_BUCKET_NAME": "media-query-mediabucket-fb8oddjbslv1", "VIDEO_TOPIC_NAME": "media-query-VideoTopic-KU38EEHIIUV1", "VIDEO_ROLE_ARN": "arn:aws:iam::123456789123:role/media-query-VideoRole-1GKK0CA30VCAD", "VIDEO_TOPIC_ARN": "arn:aws:sns:us-west-2:123456789123:media-query-VideoTopic-KU38EEHIIUV1" } } } } Next, the ``@app.on_sns_message`` is used to connect an SNS topic to our Lambda function. This is only used for video processing with Rekognition. Because of the longer processing times of video compared to images, video analysis is performed by first starting a "video label job". When you start this asynchronous job, we can specify an SNS topic that Rekognition will publish to when the job is complete, as shown in the ``_handle_created_video`` function below. .. literalinclude:: code/app.py :linenos: :lineno-match: :pyobject: _handle_created_video The ``add_video_file()`` function will then query for the results of the job (the ``JobId`` is provided as part of the SNS message that's published) and store the results in the DynamoDB table. The final two decorators of this app creates a REST API with Amazon API Gateway and defines two routes: ``/`` and ``/{name}``. Requesting the root URL of ``/`` is equivalent to a "List" API call that will return all the media files that have been analyzed so far. Request ``/{name}``, where ``{name}`` is the name of the media file that was uploaded to S3 will return the detected labels for that single resource. This is equivalent to a "Get" API call. .. note:: This sample application returns all analyzed media files in its List API call. In practice, you should paginate your List API calls to ensure you don't return unbounded results. Supporting Files ---------------- The event handlers described in the previous section interact with Rekognition and DynamoDB through clients that are accessed through ``get_rekognition_client()`` and ``get_rekognition_client()`` respectively. These clients are high level wrappers to the corresponding boto3 clients/resources for these services. The code for these high level clients are in ``chalicelib/rekognition.py`` and ``chalicelib/db.py``. If we look at the ``DynamoMediaDB.add_media_file()`` method in the ``chalicelib/db.py`` file, we see that it's a small wrapper around the ``put_item()`` operation of the underlying DynamoDB API: .. literalinclude:: code/chalicelib/db.py :linenos: :lineno-match: :pyobject: DynamoMediaDB.add_media_file We see a similar pattern in ``chalicelib/rekognition.py``. Here's the ``start_video_label_job`` job that starts the asynchronous processing discussed in the previous section. .. literalinclude:: code/chalicelib/rekognition.py :linenos: :lineno-match: :pyobject: RekognitonClient.start_video_label_job As you can see, it's a small wrapper around the ``start_label_detection`` operation of the underlying Rekognition API. We encourage you to look through the rest of the ``chalicelib/`` directory to see how these high level clients are implemented. Cleaning Up =========== If you're done experimenting with this sample app, you can run these commands to delete this app. 1. Delete the chalice application:: $ chalice delete Deleting Rest API: kyfn3gqcf0 Deleting function: arn:aws:lambda:us-west-2:123456789123:function:media-query-dev Deleting IAM role: media-query-dev-api_handler Deleting function: arn:aws:lambda:us-west-2:123456789123:function:media-query-dev-add_video_file Deleting IAM role: media-query-dev-add_video_file Deleting function: arn:aws:lambda:us-west-2:123456789123:function:media-query-dev-handle_object_removed Deleting IAM role: media-query-dev-handle_object_removed Deleting function: arn:aws:lambda:us-west-2:123456789123:function:media-query-dev-handle_object_created Deleting IAM role: media-query-dev-handle_object_created 2. Delete all objects in your S3 bucket:: $ aws s3 rm s3://$MEDIA_BUCKET_NAME --recursive delete: s3://media-query-mediabucket-4b1h8anboxpa/sample.jpg delete: s3://media-query-mediabucket-4b1h8anboxpa/sample.mp4 3. Delete the CloudFormation stack containing the additional AWS resources:: $ aws cloudformation delete-stack --stack-name media-query