# Building near real-time discovery platform
with AWS Lambda, Amazon Kinesis Firehose and Elasticsearch This is the code repository for code sample used in AWS Big data blog [Building a Near-Real-Time Discovery Platform with AWS] ## Prerequisites - Amazon Web Services account - Elasticsearch cluster - [AWS Command Line Interface (CLI)] - [Node.js] installed - [Twitter application] with consumer key (API Key), consumer secret (API Secret), access token, and access token secret ## Overview of Example ### AWS Lambda function AWS Lambda function (lambda-s3-twitter-to-es-python/lambda_function.py) that is triggered once a new file is created on S3. The function does the following:
1. Reads the file content
2. Parses the content to JSON format (Elasticsearch stores documents in JSON format)
3. Analyzes Twitter data (lambda-s3-twitter-to-es-python/tweet_utils.py):
  a. Extracts Twitter mentions (@username) from the tweet text.
  b. Extracts sentiment based on emoticons. If there’s no emoticon in the text the function uses [textblob sentiment analysis]
4. Loads the data to Elasticsearch (twitter_to_es.py) using [elasticsearch-py library]
You can download and unzip the deployment package (packages/s3_twitter_to_es.zip), which already includes elasticsearch and textblob modules, or [create a deployment package yourself].
Please replace ``<>`` with your values. Modify s3-twitter-to-es-python/config.py by assigning es_host and es_port Zip your deployment folder ``` cd path/to/s3-twitter-to-es-python zip -r -9 ../s3-twitter-to-es-python.zip . ``` [create IAM Role] and name it ```<>``` create aws lambda function ``` aws lambda create-function \ --region <> \ --function-name s3-twitter-to-es-python \ --zip-file fileb://path/to/s3_twitter_to_es.zip \ --role arn:aws:iam::<>:role/<> \ --handler lambda_function.handler \ --runtime python2.7 \ --timeout 120 ``` [Add S3 as the event source] to the lambda function with your ```<>``` and ```<>``` ### Running example Create Firehose IAM Role named “firehose_delivery_role” based on the following policy: { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Action": [ "s3:AbortMultipartUpload", "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::```<>```" ] } ] } setup nodejs application ``` cd firehose-twitter-streaming-nodejs npm install ``` modify configurations in config.js • firehose
  o DeliveryStreamName – name your stream. The app will create the delivery stream if it does not exist
  o BucketARN: Use ```<>``` that you entered as the event source for the lambda function
  o RoleARN: Use the Firehose role you created earlier (“firehose_delivery_role”)
  o Prefix: Use ```<>``` that you entered as the event source for the lambda function
• twitter – enter your Twitter application keys.
• region – your firehose region (e.g.: us-east-1, us-west-2, eu-west-1)
run the application ``` node twitter_stream_producer_app ``` Please see [Building a Near-Real-Time Discovery Platform with AWS] for more details on using Elasticsearch and Kibana as the discovery platform. [AWS Command Line Interface (CLI)]:http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html [textblob sentiment analysis]:http://textblob.readthedocs.org/en/dev/quickstart.html#sentiment-analysis [elasticsearch-py library]:http://elasticsearch-py.readthedocs.org/en/master/ [create a deployment package yourself]:http://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html [create IAM Role]:http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-s3-events-adminuser-create-test-function-create-execution-role.html [Add S3 as the event source]:http://docs.aws.amazon.com/lambda/latest/dg/getting-started-2-integrate-s3events-console.html [Node.js]:https://nodejs.org [Twitter application]:https://apps.twitter.com/ [Building a Near-Real-Time Discovery Platform with AWS]:http://blogs.aws.amazon.com/bigdata/post/Tx1Z6IF7NA8ELQ9/Building-a-Near-Real-Time-Discovery-Platform-with-AWS