# Real-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS

[This post](http://blogs.aws.amazon.com/bigdata/post/Tx2CDD4Y46WIWOV/Real-time-Stream-Processing-Using-Apache-Spark-Streaming-and-Apache-Kafka-on-AWS) demonstrates how to set up Apache Kafka on [Amazon EC2](https://aws.amazon.com/ec2), use Spark Streaming on [Amazon EMR](https://aws.amazon.com/emr) to process data coming in to Apache Kafka topics, and query streaming data using Spark SQL on Amazon EMR.  

This repo provides:
- An [AWS CloudFormation](https://aws.amazon.com/cloudformation) stack to set up Apache Kafka on Amazon EC2
- Scripts/code to create the Apache Kafka topic and producer 
- Spark Streaming and Spark SQL code to run on Amazon EMR
 
For more information about how to set everything up, see the post. 

## Clone the repo
Use the following commands:

1) sudo yum install git

2) git clone https://github.com/awslabs/aws-big-data-blog.git

3) cd aws-big-data-blog/aws-blog-sparkstreaming-from-kafka


## Install the code
Use the following command:
```mvn clean install```