ó <¿CVc@sbdZddlmZddlZddlmZddlmZedefd„ƒYƒZdS(u‚ A word stemmer based on the Lancaster stemming algorithm. Paice, Chris D. "Another Stemmer." ACM SIGIR Forum 24.3 (1990): 56-61. iÿÿÿÿ(tunicode_literalsN(tStemmerI(tpython_2_unicode_compatibletLancasterStemmercsBs\eZdZd|Zdt„Zdu„Zdv„Zdw„Zdx„Zdy„Z dz„Z d{„Z RS(}u& Lancaster Stemmer >>> from nltk.stem.lancaster import LancasterStemmer >>> st = LancasterStemmer() >>> st.stem('maximum') # Remove "-um" when word is intact 'maxim' >>> st.stem('presumably') # Don't remove "-um" when word is not intact 'presum' >>> st.stem('multiply') # No action taken if word ends with "-ply" 'multiply' >>> st.stem('provision') # Replace "-sion" with "-j" to trigger "j" set of rules 'provid' >>> st.stem('owed') # Word starting with vowel must contain at least 2 letters 'ow' >>> st.stem('ear') # ditto 'ear' >>> st.stem('saying') # Words starting with consonant must contain at least 3 'say' >>> st.stem('crying') # letters and one of those letters must be a vowel 'cry' >>> st.stem('string') # ditto 'string' >>> st.stem('meant') # ditto 'meant' >>> st.stem('cement') # ditto 'cem' uai*2.ua*1.ubb1.ucity3s.uci2>ucn1t>udd1.udei3y>udeec2ss.udee1.ude2>udooh4>ue1>ufeil1v.ufi2>ugni3>ugai3y.uga2>ugg1.uht*2.u hsiug5ct.uhsi3>ui*1.ui1y>uji1d.ujuf1s.uju1d.ujo1d.ujeh1r.ujrev1t.ujsim2t.ujn1d.uj1s.ulbaifi6.ulbai4y.ulba3>ulbi3.ulib2l>ulc1.ulufi4y.uluf3>ulu2.ulai3>ulau3>ula2>ull1.umui3.umu*2.umsi3>umm1.unois4j>unoix4ct.unoi3>unai3>una2>unee0.une2>unn1.upihs4>upp1.ure2>urae0.ura2.uro2>uru2>urr1.urt1>urei3y>usei3y>usis2.usi2>ussen4>uss0.usuo3>usu*2.us*1>us0.u tacilp4y.uta2>utnem4>utne3>utna3>utpir2b.utpro2b.utcud1.utpmus2.utpec2iv.utulo2v.utsis0.utsi3>utt1.uuqi3.uugo1.uvis3j>uvie0.uvi2>uylb1>uyli3y>uylp0.uyl2>uygo1.uyhp1.uymo1.uypo1.uyti3>uyte3>uytl2.uyrtsi5.uyra3>uyro3>uyfi3.uycn2t>uyca3>uzi2>uzy1s.cCs i|_dS(u5Create an instance of the Lancaster stemmer. N(trule_dictionary(tself((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pyt__init__©scCs’tjdƒ}i|_xs|D]k}|j|ƒsGtd|ƒ‚n|dd!}||jkrz|j|j|ƒq|g|j|\.]?$uThe rule %s is invalidiiN(tretcompileRtmatcht ValueErrortappend(Rt rule_tuplet valid_ruletrulet first_letter((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pyt parseRules¯s   cCsJ|jƒ}|}t|jƒdkr:|jtjƒn|j||ƒS(u1Stem a word using the Lancaster stemmer. i(tlowertlenRRRR t_LancasterStemmer__doStemming(Rtwordt intact_word((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pytstem¿s  cCs“tjdƒ}t}xw|rŽ|j|ƒ}|dksL|||jkrUt}qt}x|j||D]}|j|ƒ}|rm|jƒ\} } } } } t| ƒ} |j | ddd…ƒrr| r&||krl|j || ƒrl|j || | ƒ}t}| dkrt}nPqlqo|j || ƒro|j || | ƒ}t}| dkrht}nPqoqrqmqmW|tkrt}qqW|S(u)Perform the actual word stemming u#^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$iNiÿÿÿÿu.( RRtTruet _LancasterStemmer__getLastLetterRtFalseR tgroupstinttendswitht_LancasterStemmer__isAcceptablet_LancasterStemmer__applyRule(RRRR tproceedtlast_letter_positiontrule_was_appliedRt rule_matcht ending_stringt intact_flagt remove_totalt append_stringt cont_flag((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pyt __doStemmingÎsD              cCsAd}x4tt|ƒƒD] }||jƒr8|}qPqW|S(uQGet the zero-based index of the last alphabetic character in this string iÿÿÿÿ(trangeRtisalpha(RRt last_lettertposition((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pyt__getLastLetter s  cCs‡t}|ddkr8t|ƒ|dkrƒt}qƒnKt|ƒ|dkrƒ|ddkrgt}qƒ|ddkrƒt}qƒn|S(u:Determine if the word is acceptable for stemming. iuaeiouyiii(RRR(RRR%tword_is_acceptable((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pyt__isAcceptables   cCs4t|ƒ|}|d|!}|r0||7}n|S(u,Apply the stemming rule to the word i(R(RRR%R&tnew_word_length((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pyt __applyRule's   cCsdS(Nu((R((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pyt__repr__3s(suai*2.ua*1.ubb1.ucity3s.uci2>ucn1t>udd1.udei3y>udeec2ss.udee1.ude2>udooh4>ue1>ufeil1v.ufi2>ugni3>ugai3y.uga2>ugg1.uht*2.u hsiug5ct.uhsi3>ui*1.ui1y>uji1d.ujuf1s.uju1d.ujo1d.ujeh1r.ujrev1t.ujsim2t.ujn1d.uj1s.ulbaifi6.ulbai4y.ulba3>ulbi3.ulib2l>ulc1.ulufi4y.uluf3>ulu2.ulai3>ulau3>ula2>ull1.umui3.umu*2.umsi3>umm1.unois4j>unoix4ct.unoi3>unai3>una2>unee0.une2>unn1.upihs4>upp1.ure2>urae0.ura2.uro2>uru2>urr1.urt1>urei3y>usei3y>usis2.usi2>ussen4>uss0.usuo3>usu*2.us*1>us0.u tacilp4y.uta2>utnem4>utne3>utna3>utpir2b.utpro2b.utcud1.utpmus2.utpec2iv.utulo2v.utsis0.utsi3>utt1.uuqi3.uugo1.uvis3j>uvie0.uvi2>uylb1>uyli3y>uylp0.uyl2>uygo1.uyhp1.uymo1.uypo1.uyti3>uyte3>uytl2.uyrtsi5.uyra3>uyro3>uyfi3.uycn2t>uyca3>uzi2>uzy1s.( t__name__t __module__t__doc__R RRRRRRRR2(((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pyRsø    ;  ( R5t __future__RRt nltk.stem.apiRt nltk.compatRR(((se/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/stem/lancaster.pyt s