Running open-source Accumulo on EMR ==================================== [Accumulo](https://accumulo.apache.org/) requires a [Zookeeper](http://zookeeper.apache.org/). It is installed as part of this bootstrap on master node. You can refer to AWS Big data blog [Running Apache Accumulo on Amazon EMR](http://blogs.aws.amazon.com/bigdata/post/Tx15973X6QHUM43/Running-Apache-Accumulo-on-Amazon-EMR) for more information. Creating cluster ----------------- ``` aws emr create-cluster --name Accumulo --no-auto-terminate \ --bootstrap-actions Path=s3://elasticmapreduce.bootstrapactions/accumulo/1.6.1/install-accumulo_mj,Name=Install_Accumulo --ami-version 3.3.1 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge \ --ec2-attributes KeyName= ``` Running a sample ---------------- * SSH to master node * Log into the accumulo shell: `$ ~/accumulo/bin/accumulo shell -u username -p password` * Create a table called 'hellotable': `username@instance> createtable hellotable` * Launch a Java program that inserts data with a BatchWriter: ``` ~/accumulo/bin/accumulo org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter -i instance -z 127.0.0.1 -u root -p secret -t hellotable ``` `ZK_IPADDR: IP address of a Master EMR node` On the accumulo status page at the URL below (where 'master' is replaced with the name or IP of your accumulo master), you should see 50K entries * To view the entries, use the shell to scan the table: ```sh username@instance> table hellotable username@instance hellotable> scan ``` * Or you can go to Accumulo UI (`http://[master-node]:50095`)