VY.@sdZddlZddlZddlZddlmZmZmZddlm Z ddl m Z ddl m Z ddlmZGd d d eZdS) a Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco N) InputStateLanguageFilter ProbingState)EscCharSetProber) Latin1Prober)MBCSGroupProber)SBCSGroupProberc@seZdZdZdZejdZejdZejdZ dddd d d d d ddddddddiZ e j ddZ ddZddZddZdS)UniversalDetectoraq The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g?s[-]s(|~{)s[-]z iso-8859-1z Windows-1252z iso-8859-2z Windows-1250z iso-8859-5z Windows-1251z iso-8859-6z Windows-1256z iso-8859-7z Windows-1253z iso-8859-8z Windows-1255z iso-8859-9z Windows-1254z iso-8859-13z Windows-1257cCsqd|_g|_d|_d|_d|_d|_d|_||_tj t |_ d|_ |j dS)N)_esc_charset_prober_charset_probersresultdone _got_data _input_state _last_char lang_filterlogging getLogger__name__logger_has_win_bytesreset)selfrrO/tmp/pip-build-04bmskau/requests/requests/packages/chardet/universaldetector.py__init__Ps         zUniversalDetector.__init__cCsddddddi|_d|_d|_d|_tj|_d|_|jra|jj x|j D]}|j qkWdS)z Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. encodingN confidenceglanguageF) r rrrr PURE_ASCIIrrr rr )rproberrrrr]s       zUniversalDetector.resetcCsF|jr dSt|sdSt|ts8t|}|jsc|jtjrqddddddi|_n|jtj tj frddddddi|_n|jd rdd ddddi|_nc|jd rdd ddddi|_n6|jtj tj fr:dd ddddi|_d|_|jddk rcd|_dS|j tjkr|jj|rtj|_ n7|j tjkr|jj|j|rtj|_ |dd|_|j tjkrd|js t|j|_|jj|tjkrBd|jjd|jjd|jji|_d|_n|j tjkrB|jst |jg|_|jt!j"@r|jj#t$|jj#t%xZ|jD]O}|j|tjkrd|jd|jd|ji|_d|_PqW|j&j|rBd|_'dS)a Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Nrz UTF-8-SIGrg?rzUTF-32szX-ISO-10646-UCS-4-3412szX-ISO-10646-UCS-4-2143zUTF-16Tr)(rlen isinstance bytearrayr startswithcodecsBOM_UTF8r BOM_UTF32_LE BOM_UTF32_BEBOM_LEBOM_BErrr!HIGH_BYTE_DETECTORsearch HIGH_BYTE ESC_DETECTORrZ ESC_ASCIIr rrfeedrZFOUND_IT charset_nameget_confidencerr rrZNON_CJKappendr rWIN_BYTE_DETECTORr)rZbyte_strr"rrrr3ns                zUniversalDetector.feedcCs|jr|jSd|_|js5|jjdn|jtjkreddddddi|_n|jtjkrNd }d }d }x>|j D]3}|sq|j }||kr|}|}qW|rN||j krN|j }|j j }|j }|jd r0|jr0|jj||}d|d|d|ji|_|jjtjkr|jdd kr|jjd xF|j d jD]4}|sq|jjd|j |j|j qW|jS)z Stop analyzing the current document and come up with a final prediction. :returns: The ``result`` attribute, a ``dict`` with the keys `encoding`, `confidence`, and `language`. Tzno data received!rasciirg?rr#Ngziso-8859z no probers hit minimum thresholdrz%s %s confidence = %s)rr rrdebugrrr!r1r r5MINIMUM_THRESHOLDr4lowerr(r ISO_WIN_MAPgetrgetEffectiveLevelrDEBUGZprobers)rZprober_confidenceZmax_prober_confidenceZ max_proberr"r4Zlower_charset_namerrrrclosesT            zUniversalDetector.closeN)r __module__ __qualname____doc__r:recompiler/r2r7r<rZALLrrr3r@rrrrr 2s"    mr )rCr)rrDZenumsrrrZ escproberrZ latin1proberrZmbcsgroupproberrZsbcsgroupproberr objectr rrrr$s