ó ¯ÆZc@sÿddlmZddlmZddlZddlZddlZddlmZddl Z ddl m Z ddl m Z ddlZddlZddlZddlZddlZddlZdZd„Zd „Zd „Zd „Zd „ZdS( iÿÿÿÿ(tprint_function(tDecimalN(t BeautifulSoup(tSentimentIntensityAnalyzer(t stopwordsspywren-workshopc CsøyÝtƒtjddƒjdƒ}x°|D]¨}t|ƒ}tƒ}|j|ƒ}i}||dAscss1|]'}|jdƒD]}|jƒVqqdS(s N(tsplitR9(R:R;tphrase((sb/Users/olivierk/Documents/Code/Reinvent2017-pywren-workshop/Lab-3-Scrape-Sentiment/GDELT_scrape.pys Css css|]}|r|VqdS(N((R:tchunk((sb/Users/olivierk/Documents/Code/Reinvent2017-pywren-workshop/Lab-3-Scrape-Sentiment/GDELT_scrape.pys Esitenglishi2( t splitlinestjointnltkt word_tokenizetlent isnumerictlowertsettcorpusRR tFreqDistt most_common(RtlinestchunksR Rtdefault_stopwordstfdist((sb/Users/olivierk/Documents/Code/Reinvent2017-pywren-workshop/Lab-3-Scrape-Sentiment/GDELT_scrape.pyR?s+%%c :Cs¿tjddƒ}yŒ|jdddd|ƒ}tj|djƒjdd ƒjd d ƒƒ}d d d ddddddddddddddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4d5d6d7d8d9d:d;d<d=d>d?d@dAdBdCdDg:}tj||dEdFƒ}g}x+t |ƒD]\}}|j |dDƒqAWg} x*|D]"} | | kro| j | ƒqoqoW| dG SWnt j j k rº} | SXdS(HNR"s us-east-1R$sgdelt-open-datatKeysevents/tBodysutf-8treplacetasciit GLOBALEVENTIDtSQLDATEt MonthYeartYeart FractionDatet Actor1Codet Actor1NametActor1CountryCodetActor1KnownGroupCodetActor1EthnicCodetActor1Religion1CodetActor1Religion2CodetActor1Type1CodetActor1Type2CodetActor1Type3Codet Actor2Codet Actor2NametActor2CountryCodetActor2KnownGroupCodetActor2EthnicCodetActor2Religion1CodetActor2Religion2CodetActor2Type1CodetActor2Type2CodetActor2Type3Codet IsRootEventt EventCodet EventBaseCodet EventRootCodet QuadClasstGoldsteinScalet NumMentionst NumSourcest NumArticlestAvgTonetActor1Geo_TypetActor1Geo_FullNametActor1Geo_CountryCodetActor1Geo_ADM1Codet Actor1Geo_LattActor1Geo_LongtActor1Geo_FeatureIDtActor2Geo_TypetActor2Geo_FullNametActor2Geo_CountryCodetActor2Geo_ADM1Codet Actor2Geo_LattActor2Geo_LongtActor2Geo_FeatureIDtActionGeo_TypetActionGeo_FullNametActionGeo_CountryCodetActionGeo_ADM1Codet ActionGeo_LattActionGeo_LongtActionGeo_FeatureIDt DATEADDEDt SOURCEURLt delimiters iè(Rtclientt get_objecttStringIOR3tdecodetencodetcsvt DictReadert enumerateRR*R+R,( tfileR"t s3_objecttft fieldnamestitemsRtititemtlinks_without_duplicatesRR ((sb/Users/olivierk/Documents/Code/Reinvent2017-pywren-workshop/Lab-3-Scrape-Sentiment/GDELT_scrape.pytget_urls_from_gdelt_dataTs"1ZZ   (t __future__RtdecimalRR“R1thashlibtbs4RRBtnltk.sentiment.vaderRt nltk.corpusRR&RR*RtpywrentostS3BUCKETR!R RRRž(((sb/Users/olivierk/Documents/Code/Reinvent2017-pywren-workshop/Lab-3-Scrape-Sentiment/GDELT_scrape.pyts$