ó <¿CVc@s×dZddlmZddlmZmZmZddlmZddl m Z m Z ddl m Z ddlmZmZddddd „Zd „Zd „Zd efd „ƒYZddd„Zee_dS(s Utility functions for parsers. iÿÿÿÿ(tprint_function(tCFGtFeatureGrammartPCFG(tload(tChartt ChartParser(tInsideChartParser(t FeatureCharttFeatureChartParsericKsút||}t|tƒs-tdƒ‚nt|tƒrg|dkrQt}n||d|d|ƒSt|tƒr¶|dkr‹t}n|dkr t }n||d|d|ƒS|dkrËt }n|dkràt }n||d|d|ƒSdS(s¦ Load a grammar from a file, and build a parser based on that grammar. The parser depends on the grammar format, and might also depend on properties of the grammar itself. The following grammar formats are currently supported: - ``'cfg'`` (CFGs: ``CFG``) - ``'pcfg'`` (probabilistic CFGs: ``PCFG``) - ``'fcfg'`` (feature-based CFGs: ``FeatureGrammar``) :type grammar_url: str :param grammar_url: A URL specifying where the grammar is located. The default protocol is ``"nltk:"``, which searches for the file in the the NLTK data package. :type trace: int :param trace: The level of tracing that should be used when parsing a text. ``0`` will generate no tracing output; and higher numbers will produce more verbose tracing output. :param parser: The class used for parsing; should be ``ChartParser`` or a subclass. If None, the class depends on the grammar format. :param chart_class: The class used for storing the chart; should be ``Chart`` or a subclass. Only used for CFGs and feature CFGs. If None, the chart class depends on the grammar format. :type beam_size: int :param beam_size: The maximum length for the parser's edge queue. Only used for probabilistic CFGs. :param load_args: Keyword parameters used when loading the grammar. See ``data.load`` for more information. s1The grammar must be a CFG, or a subclass thereof.ttracet beam_sizet chart_classN( Rt isinstanceRt ValueErrorRtNoneRRR RRR(t grammar_urlR tparserR R t load_argstgrammar((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/util.pyt load_parsers$"          c csoxht|ddƒD]T\}\}}t|ƒ|d||dddddg }dj|ƒd}|VqWdS( st A module to convert a single POS tagged sentence into CONLL format. >>> from nltk import word_tokenize, pos_tag >>> text = "This is a foobar sentence." >>> for line in taggedsent_to_conll(pos_tag(word_tokenize(text))): ... print(line, end="") 1 This _ DT DT _ 0 a _ _ 2 is _ VBZ VBZ _ 0 a _ _ 3 a _ DT DT _ 0 a _ _ 4 foobar _ JJ JJ _ 0 a _ _ 5 sentence _ NN NN _ 0 a _ _ 6 . _ . . _ 0 a _ _ :param sentence: A single input sentence to parse :type sentence: list(tuple(str, str)) :rtype: iter(str) :return: a generator yielding a single sentence in CONLL format. tstartit_t0tas s N(t enumeratetstrtjoin(tsentencetitwordttagt input_str((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/util.pyttaggedsent_to_conllOs%*ccs6x/|D]'}xt|ƒD] }|VqWdVqWdS(sK A module to convert the a POS tagged document stream (i.e. list of list of tuples, a list of sentences) and yield lines in CONLL format. This module yields one line per word and two newlines for end of sentence. >>> from nltk import word_tokenize, sent_tokenize, pos_tag >>> text = "This is a foobar sentence. Is that right?" >>> sentences = [pos_tag(word_tokenize(sent)) for sent in sent_tokenize(text)] >>> for line in taggedsents_to_conll(sentences): ... if line: ... print(line, end="") 1 This _ DT DT _ 0 a _ _ 2 is _ VBZ VBZ _ 0 a _ _ 3 a _ DT DT _ 0 a _ _ 4 foobar _ JJ JJ _ 0 a _ _ 5 sentence _ NN NN _ 0 a _ _ 6 . _ . . _ 0 a _ _ 1 Is _ VBZ VBZ _ 0 a _ _ 2 that _ IN IN _ 0 a _ _ 3 right _ NN NN _ 0 a _ _ 4 ? _ . . _ 0 a _ _ :param sentences: Input sentences to parse :type sentence: list(list(tuple(str, str))) :rtype: iter(str) :return: a generator yielding sentences in CONLL format. s N(R!(t sentencesRR ((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/util.pyttaggedsents_to_conllis!  t TestGrammarcBs)eZdZddd„Zed„ZRS(s Unit tests for CFG. cCs=||_t|ddƒ|_||_||_||_dS(NR i(t test_grammarRtcptsuitet_acceptt_reject(tselfRR'taccepttreject((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/util.pyt__init__—s    c Cs'x |jD]}t|ddddƒxÛddgD]Í}xÄ||D]¸}|jƒ}t|jj|ƒƒ}|r«|r«tƒt|ƒx|D]}t|ƒq”Wn|dkrß|gkrÖtd|ƒ‚qþt}qF|røtd|ƒ‚qFt} qFWq5W|r | r td ƒq q Wd S( s| Sentences in the test suite are divided into two classes: - grammatical (``accept``) and - ungrammatical (``reject``). If a sentence should parse accordng to the grammar, the value of ``trees`` will be a non-empty list. If a sentence should be rejected according to the grammar, then the value of ``trees`` will be None. tdoct:tendt R+R,sSentence '%s' failed to parse'sSentence '%s' received a parse'sAll tests passed!N(R'tprinttsplittlistR&tparseRtTrue( R*t show_treesttesttkeytsentttokensttreesttreetacceptedtrejected((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/util.pytrun s(         N(t__name__t __module__t__doc__RR-tFalseR@(((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/util.pyR$“s s#%;cCs |d k r|j|ƒ}ng}xÞ|jdƒD]Í}|dks4|d|kr\q4n|jddƒ}d }t|ƒdkrÐ|dd kr³|dd k}|d}qÐt|dƒ}|d}n|jƒ}|gkrîq4n|||fg7}q4W|S(sŽ Parses a string with one test sentence per line. Lines can optionally begin with: - a bool, saying if the sentence is grammatical or not, or - an int, giving the number of parse trees is should have, The result information is followed by a colon, and then the sentence. Empty lines and lines beginning with a comment char are ignored. :return: a list of tuple of sentences and expected results, where a sentence is a list of str, and a result is None, or bool, or int :param comment_chars: ``str`` of possible comment characters. :param encoding: the encoding of the string, if it is binary s tiR/iiR6ttrueRDtfalseN(sTruestruesFalsesfalse(sTruestrue(RtdecodeR3tlentint(tstringt comment_charstencodingR"Rt split_infotresultR;((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/util.pytextract_test_sentencesÁs&     N(RCt __future__Rt nltk.grammarRRRt nltk.dataRtnltk.parse.chartRRtnltk.parse.pchartRtnltk.parse.featurechartRR RRR!R#tobjectR$RPRDt__test__(((sa/private/var/folders/cc/xm4nqn811x9b50x1q_zpkmvdjlphkp/T/pip-build-FUwmDn/nltk/nltk/parse/util.pyt s 7  *.&