# Serverless Audio Indexing
This repository contains the code supporting the blog post "Indexing Audio with Amazon Transcribe, Amazon Comprehend and ElasticSearch". It shows a simple AWS architecture of how automatically index audio files uploaded on a S3 bucket. The solution leverages AWS Step Functions, AWS Lambda Amazon, Transcribe, Amazon Comprehend and Amazon ElasticSearch. The code on this repository supports the blog post [Indexing Audios with Amazon Transcribe, Amazon Comprehend and ElasticSearch](https://aws.amazon.com/pt/blogs/aws-brasil/indexando-audios-com-amazon-transcribe-amazon-comprehend-e-elasticsearch/)
## Architecture
The following architecture was built to index audio files ingested on S3 bucket.
## Workflow Overview
The solution step function workflow has the following macro activities:
1. Start the transcribe job: starts an asynchronous transcribe job using the audio uploaded on the S3 bucket;
2. Get Transcribe Job Status: get the the previous step job status and checks if has finished or failed;
3. Start Comprehend Job: starts an asynchronous Compherend job using the transcription file generated on the step 1. The jobs analyze Sentiment and entities of the file;
4. Get Comprehend Job Status: get the the previous step job status and checks if has finished or failed;
5. Load to ES: load the documents generated by comprehend into the ElasticSearch cluster.
## Setup
### Prerequisites
1. Configure the AWS Credentials in your environment. Refer to [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html);
2. Download and install AWS CLI. Refer to [Installing the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html);
3. Download and install AWS SAM CLI. Refer to [Installing the AWS SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html);
4. Download and install Docker. Refer to [Docker](https://www.docker.com/products/docker-desktop).
### Deploy
Run the commands below to deploy the architecutre. It creates a CloudFormation stack with all components and a working workflow.
```
sam build --use-container && sam deploy --guided
```
To launch the stack, the following parameters are required:
Parameter name |Description |AllowedPattern/Values |
:----------:|:---------------------------------------------------------------:|:----------------------------------------------------------------------:|
ElasticsearchDomainName |Name used on the domain of the Elasticsearch cluster | [a-z][a-z0-9]* |
SourceAudioLanguage |Source Language of the audio files | Supported Values:
‘de-DE’ German
‘de-CH’ Swiss German
‘en-AU’ Australian English
‘en-GB’ British English
‘en-IN’ Indian English
‘en-IE’ Irish English
‘en-AB’: Scottish English
‘en-US’: US English
‘en-WL’ Welsh English
‘es-ES’ Spanish
‘es-US’ US Spanish
‘it-IT’ Italian
‘pt-PT’ Portuguese
‘pt-BR’ Brazilian Portuguese
‘fr-FR’ French
‘fr-CA’ Canadian French
‘ja-JP’ Japanese
‘ko-KR’ Korean
‘hi-IN’ Hindi
‘ar-SA’ Arabic
‘zh-CN’ Chinese (simplified) |
### Running the workflow
After the launch of the stack, the following outputs will be generated: `RawBucket`, `UserPoolId`, `UserPoolArn`, `IdentityPoolId`, `RawBucket`, `MainBucket`, `TranscribeS3Bucket` and `StepFunction`. To start the workflow, it's necessary to upload a `.mp3` or `.mp4` file on the `RawBucket`, doing so, the workflow will be started.
We can track the execution of the workflow on the `StepFunction` resource:
## Clean up (Optional)
If you don't want to continue using the application, take the following steps to clean up its resources and avoid further charges;
1. Sign in to the AWS Management Console and open the Amazon S3 console at [https://console.aws.amazon.com/s3](https://console.aws.amazon.com/s3);
2. In the **Bucket name** list, select the option that has *RawS3Bucket* on the name and then choose **Empty**;
3. On the **Empty bucket** page, confirm that you want to empty the bucket by entering the bucket name into the text field, and then choose **Empty**;
4. Repeat the process for the bucket that contains *MainS3Bucket* and *TranscribeS3Bucket* on the name
5. Open the AWS CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation);
6. On the **Stacks** page in the CloudFormation console, select the stack you deployed during the Deploy setup;
7. In the stack details pane, choose **Delete**.
## Security
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
## License
This library is licensed under the MIT-0 License. See the LICENSE file.