Using the ML part of this Project NOTES: - POM was updated to use spark 1.6.2 - use the pom attached - the context needs to be a Spark Context (not Hive or SparkSQl context) - the ml_commands file has sample execution commands - the Scala Src File is located at com.amazonaws.proserv.ml.MovieRecML.scala To create the cluster - proceed to create a JobServer cluster as you normally would (see blogpost) - in /mnt/lib/spark-jobserver the context file (emr.conf) needs to be set to create a new Spark Context -- the file at <project>/jobserver_configs/emr_contexts_ml.conf file reflect the proper change to use the SparkContext -- you can manually edit the emr.conf on the server, replace it with the emr_contexts.conf (naming it emr.conf) or reference this file in your BA Once the Cluster is created, there are three access point (you MUST load the data first - only once necessary) - com.amazonaws.proserv.ml.LoadModelAndData -- this entry point Loads the Mode Data from S3 and the Movies Data -- BOTH paths MUST end in a trailing back slash / -- see the sample commands for execution -- it takes JSON as input Example { s3DataLoc:\"s3://dgraeberaws-blogs/ml/data/movielens/small/\", s3ModelLoc:\"s3://dgraeberaws-blogs/ml/models/movielens/recommendations/\" } - com.amazonaws.proserv.ml.MoviesRec - this entry point will give the top 10 movies for the user id passed in -- it takes a parameter 'userId' - in the test cases it was 100 -- it takes JSON as input Example { userId: 100 } - com.amazonaws.proserv.ml.MoviesRecByGenre - this entry point will give the top 5 movies for a userId and a given Genre -- it takes a parameter 'userId' - in the test cases it was 100 -- it takes JSON as input Example { userId: 100, "genre" :"Comedy" }