ó <żCVc@s˜ddlmZddlmZmZddlmZmZddlm Z ddl m Z m Z m Z de fd„ƒYZde fd „ƒYZd S( i˙˙˙˙(tcompat(tWhitespaceTokenizertRegexpTokenizer(t AlignedSentt Alignment(t CorpusReader(tStreamBackedCorpusViewtconcattread_alignedsent_blocktAlignedCorpusReadercBseeZdZdeƒeddeƒedd„Zd d„Z d d„Z d d„Z d d „Z RS( s’ Reader for corpora of word-aligned sentences. Tokens are assumed to be separated by whitespace. Sentences begin on separate lines. t/s tgapstlatin1cCs>tj||||ƒ||_||_||_||_dS(s˜ Construct a new Aligned Corpus reader for a set of documents located at the given root directory. Example usage: >>> root = '/...path to corpus.../' >>> reader = AlignedCorpusReader(root, '.*', '.txt') # doctest: +SKIP :param root: The root directory for this corpus. :param fileids: A list or regexp specifying the fileids in this corpus. N(Rt__init__t_sept_word_tokenizert_sent_tokenizert_alignedsent_block_reader(tselftroottfileidstseptword_tokenizertsent_tokenizertalignedsent_block_readertencoding((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/aligned.pyR s    cCsb|dkr|j}nt|tjƒr6|g}ntg|D]}|j|ƒjƒ^q@ƒS(sT :return: the given file(s) as a single string. :rtype: str N(tNonet_fileidst isinstanceRt string_typesRtopentread(RRtf((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/aligned.pytraw*s   c CsPtg|j|tƒD]3\}}t||tt|j|j|jƒ^qƒS(s~ :return: the given file(s) as a list of words and punctuation symbols. :rtype: list(str) (RtabspathstTruetAlignedSentCorpusViewtFalseRRR(RRtfileidtenc((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/aligned.pytwords3sc CsPtg|j|tƒD]3\}}t||tt|j|j|jƒ^qƒS(s² :return: the given file(s) as a list of sentences or utterances, each encoded as a list of word strings. :rtype: list(list(str)) (RR"R#R$R%RRR(RRR&R'((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/aligned.pytsents?sc CsPtg|j|tƒD]3\}}t||tt|j|j|jƒ^qƒS(sp :return: the given file(s) as a list of AlignedSent objects. :rtype: list(AlignedSent) (RR"R#R$RRR(RRR&R'((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/aligned.pyt aligned_sentsLsN( t__name__t __module__t__doc__RRR#RR RR!R(R)R*(((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/aligned.pyR s   R$cBs eZdZd„Zd„ZRS(s³ A specialized corpus view for aligned sentences. ``AlignedSentCorpusView`` objects are typically created by ``AlignedCorpusReader`` (not directly by nltk users). cCsG||_||_||_||_||_tj||d|ƒdS(NR(t_alignedt_group_by_sentRRRRR (Rt corpus_fileRtalignedt group_by_sentRRR((sl/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/corpus/reader/aligned.pyR ]s      cCsİg|j|ƒD]1}|jj|ƒD]}|jj|ƒ^q&q}|jr‚tjdj|dƒƒ|ds G