# AI Powered Text Insights This package includes a sample of a prototype to help you gain insights on how your customers interact with your brand in social media. By combining zero-shot text classification, sentiment analysis, and keyword extraction we are able to obtain real time insights from posts on Twitter and present them in a dashboard. The solution consists of a tweet processing pipeline (using AWS Lambda) that classifies tweets, by calling a serverless SageMaker endpoint running a HuggingFace model, into one of the categories defined at inference time (zero-shot classification). Classified tweets are then processed with Amazon Comprehend to extract sentiment and keywords. Anomaly detection is performed on the volume of tweets per category per period of time using Amazon Lookout for Metrics and notifications are sent when anomalies are detected. All insights are presented on a QuickSight dashboard. The sample application includes some backend resources (`backend` directory) and a container that gets tweets from a Twitter stream (`stream-getter` directory). ## Deploy instructions Deploying the sample application builds the following environment in the AWS Cloud: ![architecture](architecture.png) ## Prerequisites * AWS CLI. Refer to [Installing the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html) * AWS Credentials configured in your environment. Refer to [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) * AWS SAM CLI. Refer to [Installing the AWS SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) * AWS Copilot CLI. Refer to [Install Copilot](https://aws.github.io/copilot-cli/docs/getting-started/install/) * Twitter application Bearer token. Refer to [OAuth 2.0 Bearer Token - Prerequisites](https://developer.twitter.com/en/docs/authentication/oauth-2-0/bearer-tokens) * Twitter Filtered stream rules configured. Refer to the examples in the end of this document and to [Building rules for filtered stream](https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/integrate/build-a-rule) * Docker. Refer to [Docker](https://www.docker.com/products/docker-desktop) ## Backend resources Run the command below, from within the `backend/` directory, to deploy the backend: ``` sam build --use-container && sam deploy --guided ``` Follow the prompts. NOTE: Due to a constraint in Lookout for Metrics naming of databases please name you stack using the following regular expression pattern: [a-zA-Z0-9_]+ The command above deploys an AWS CloudFormation stack in your AWS account. You will need the stack's output values to deploy the Twitter stream getter container. ### 1. Data format This solution generates the tweets insights, stored as JSON files, into two S3 locations, **/tweets** and **/phrases**, on the results bucket whose name is specified by the CloudFormation stack's outputs under "TweetsBucketName". Under **/tweets** and **/phrases** folders data is organized by day following the **YYYY-MM-dd 00:00:00** datetime format. Sample output files can be found in this repository under the **/sample_files** folder. ### 2. Activate the Lookout for Metrics detector To allow for you to provide historical data to the anomaly detector to [reduce the detector’s learning time](https://docs.aws.amazon.com/lookoutmetrics/latest/dev/services-athena.html) the prototype is deployed with the anomaly detector disabled. If you have historical data with the same format as the data generated by this solution you may move it to the data S3 bucket generated by deploying the backend (TweetsBucketName). Make sure to follow the format of the files in the **/sample_files** folder. [Follow the instructions](https://docs.aws.amazon.com/lookoutmetrics/latest/dev/gettingstarted-detector.html) to activate your detector, the detector’s name can be found as part of the CloudFormation stack’s outputs. Optionally you can configure alerts for your anomaly detector. [Follow the instructios](https://docs.aws.amazon.com/lookoutmetrics/latest/dev/gettingstarted-detector.html) to create an alert that sends a notification to SNS, the SNS topic name are part of the CloudFormation stack’s outputs. ## Twitter stream getter container Run the command below, from within the `stream-getter/` directory, to deploy the container application: ### 1. Create application ``` copilot app init twitter-app ``` ### 2. Create environment ``` copilot env init --name test --region ``` Replace `` with the same region to which you deployed the backend resources previously. Follow the prompts accepting the default values. The above command provisions the required network infrastructure (VPC, subnets, security groups, and more). In its default configuration, Copilot follows [AWS best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/) and creates a VPC with two public and two private subnets in different Availability Zones (AZs). For security reasons, we'll soon configure the placement of the service as _private_. Because of that, the service will run on the private subnets and Copilot will automatically add NAT Gateways, but NAT Gateways increase the overall cost. In case you decide to run the application in a single AZ to have only one NAT Gateway **(not recommended)**, you can run the following command instead: ``` copilot env init --name test --region \ --override-vpc-cidr 10.0.0.0/16 --override-public- cidrs 10.0.0.0/24 --override-private-cidrs 10.0.1.0/24 ``` **Note:** The current implementation is prepared to run one container at a time solely. Not only your Twitter account should allow you to have more than one Twitter's stream connection at a time, but the application also must be modified to handle other complexities such as duplicates (learn more in [Recovery and redundancy features](https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/integrate/recovery-and-redundancy-features)). Even though there will be only one container running at a time, having two AZs is still recommended, because in case one AZ is down, ECS can run the application in the other AZ. ### 3. Deploy the environment ``` copilot env deploy --name test ``` ### 4. Create service ``` copilot svc init --name stream-getter --svc-type "Backend Service" --dockerfile ./Dockerfile ``` ### 5. Create secret to store the Twitter Bearer token ``` copilot secret init --name TwitterBearerToken ``` When prompted to provide the secret, paste the Twitter Bearer token. ### 6. Edit service manifest Open the file `copilot/stream-getter/manifest.yml` and change its content to the following: ``` name: stream-getter type: Backend Service image: build: Dockerfile cpu: 256 memory: 512 count: 1 exec: true network: vpc: placement: private variables: SQS_QUEUE_URL: LOG_LEVEL: info secrets: BEARER_TOKEN: /copilot/${COPILOT_APPLICATION_NAME}/${COPILOT_ENVIRONMENT_NAME}/secrets/TwitterBearerToken ``` Replace `` with the URL of the SQS queue deployed in your AWS account. You can use the following command to get the value from the backend AWS CloudFormation stack outputs (replace `` with the name of your backend stack): ``` aws cloudformation describe-stacks --stack-name \ --query "Stacks[].Outputs[?OutputKey=='TweetsQueueUrl'][] | [0].OutputValue" ``` ### 7. Add permission to write to the queue Create a new file in `copilot/stream-getter/addons/` called `sqs-policy.yaml` with the following content: ``` Parameters: App: Type: String Description: Your application's name. Env: Type: String Description: The environment name your service, job, or workflow is being deployed to. Name: Type: String Description: The name of the service, job, or workflow being deployed. Resources: QueuePolicy: Type: AWS::IAM::ManagedPolicy Properties: PolicyDocument: Version: 2012-10-17 Statement: - Sid: SqsActions Effect: Allow Action: - sqs:SendMessage Resource: Outputs: QueuePolicyArn: Description: The ARN of the ManagedPolicy to attach to the task role. Value: !Ref QueuePolicy ``` Replace `` with the ARN of the SQS queue deployed in your AWS account. You can use the following command to get the value from the backend AWS CloudFormation stack outputs (replace `` with the name of your backend stack): ``` aws cloudformation describe-stacks --stack-name \ --query "Stacks[].Outputs[?OutputKey=='TweetsQueueArn'][] | [0].OutputValue" ``` After that, your directory should look like the following: ``` . ├── Dockerfile ├── backoff.py ├── copilot │ ├── stream-getter │ │ ├── addons │ │ │ └── sqs-policy.yaml │ │ └── manifest.yml │ └── environments │ └── test │ └── manifest.yml ├── main.py ├── requirements.txt ├── sqs_helper.py └── stream_match.py ``` ### 8. Deploy service > **IMPORTANT:** The container will connect to the Twitter stream as soon as it starts, after deploying the service. You need your Twitter stream rules configured before connecting to the stream. Therefore, if you haven't configured the rules yet, configure them before proceeding. ``` copilot svc deploy --name stream-getter --env test ``` When the deployment finishes, you should have the container running inside ECS. To check the logs, run the following: ``` copilot svc logs --follow ``` ## Visualize your insights with Amazon QuickSight To create some example visualizations from the processed text data follow the instructions on the [Creating visualizations with QuickSight.pdf](Creating_visualizations_with_QuickSight.pdf) file. ## Rules examples for filtered stream Twitter provides endpoints that enable you to create and manage rules, and apply those rules to filter a stream of real-time tweets that will return matching public tweets. For instance, following is a rule that returns tweets from the accounts `@awscloud`, `@AWSSecurityInfo`, and `@AmazonScience`: ``` from:awscloud OR from:AWSSecurityInfo OR from:AmazonScience ``` To add that rule, issue a request like the following, replacing `` with the Twitter Bearer token: ``` curl -X POST 'https://api.twitter.com/2/tweets/search/stream/rules' \ -H "Content-type: application/json" \ -H "Authorization: Bearer " -d \ '{ "add": [ { "value": "from:awscloud OR from:AWSSecurityInfo OR from:AmazonScience", "tag": "news" } ] }' ``` ## Clean up If you don't want to continue using the sample, clean up its resources to avoid further charges. Start by deleting the backend AWS CloudFormation stack which, in turn, will remove the underlying resources created then delete all the resources AWS Copilot set up for the container application, run the following commands: ``` sam delete --stack-name copilot svc delete --name stream-getter copilot env delete --name test copilot app delete ``` ## Security See [CONTRIBUTING](CONTRIBUTING.md) for more information. ## License This library is licensed under the MIT-0 License. See the [LICENSE](LICENSE) file.