� ��^c@s�dZddlZddlZddlZddlmZddlmZmZm Z ddl m Z ddl m Z ddlmZdd lmZd efd ��YZdS( s Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco i����Ni(tCharSetGroupProber(t InputStatetLanguageFiltert ProbingState(tEscCharSetProber(t Latin1Prober(tMBCSGroupProber(tSBCSGroupProbertUniversalDetectorcBs�eZdZdZejd�Zejd�Zejd�Zidd6dd6d d 6d d 6d d6dd6dd6dd6Z e j d�Z d�Z d�Zd�ZRS(sq The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g�������?s[�-�]s(|~{)s[�-�]s Windows-1252s iso-8859-1s Windows-1250s iso-8859-2s Windows-1251s iso-8859-5s Windows-1256s iso-8859-6s Windows-1253s iso-8859-7s Windows-1255s iso-8859-8s Windows-1254s iso-8859-9s Windows-1257s iso-8859-13cCsqd|_g|_d|_d|_d|_d|_d|_||_t j t �|_ d|_ |j�dS(N(tNonet_esc_charset_probert_charset_proberstresulttdonet _got_datat _input_statet _last_chart lang_filtertloggingt getLoggert__name__tloggert_has_win_bytestreset(tselfR((s:/tmp/pip-build-1THPZW/chardet/chardet/universaldetector.pyt__init__Qs         cCs�idd6dd6dd6|_t|_t|_t|_tj|_d|_ |j rg|j j �nx|j D]}|j �qqWdS(s� Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. tencodinggt confidencetlanguagetN( R R tFalseR RRRt PURE_ASCIIRRR RR (Rtprober((s:/tmp/pip-build-1THPZW/chardet/chardet/universaldetector.pyR^s      cCsy|jr dSt|�sdSt|t�s;t|�}n|js{|jtj�rwidd6dd6dd6|_n�|jtj tj f�r�idd6dd6dd6|_n�|jd �r�id d6dd6dd6|_nl|jd �rid d6dd6dd6|_n<|jtj tj f�rOid d6dd6dd6|_nt |_|jddk r{t |_dSn|jtjkr�|jj|�r�tj|_q�|jtjkr�|jj|j|�r�tj|_q�n|d|_|jtjkr�|js(t|j�|_n|jj|�tjkrui|jjd6|jj�d6|jj d6|_t |_qun�|jtjkru|j!s�t"|j�g|_!|jt#j$@r�|j!j%t&��n|j!j%t'��nx`|j!D]U}|j|�tjkr�i|jd6|j�d6|j d6|_t |_Pq�q�W|j(j|�rut |_)qundS(s� Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Ns UTF-8-SIGRg�?RRRsUTF-32s��sX-ISO-10646-UCS-4-3412s��sX-ISO-10646-UCS-4-2143sUTF-16i����(*R tlent isinstancet bytearrayRt startswithtcodecstBOM_UTF8R t BOM_UTF32_LEt BOM_UTF32_BEtBOM_LEtBOM_BEtTrueR RRRtHIGH_BYTE_DETECTORtsearcht HIGH_BYTEt ESC_DETECTORRt ESC_ASCIIR RRtfeedRtFOUND_ITt charset_nametget_confidenceRR RRtNON_CJKtappendRRtWIN_BYTE_DETECTORR(Rtbyte_strR ((s:/tmp/pip-build-1THPZW/chardet/chardet/universaldetector.pyR1os~                  c Cs>|jr|jSt|_|js5|jjd�n1|jtjkrhidd6dd6dd6|_n�|jtj krfd }d}d }xD|j D]9}|s�q�n|j �}||kr�|}|}q�q�W|rf||j krf|j}|jj�}|j �}|jd �r?|jr?|jj||�}q?ni|d6|d6|jd6|_qfn|jj�tjkr7|jdd kr7|jjd �x�|j D]�}|s�q�nt|t�rx^|jD]+}|jjd |j|j|j ��q�Wq�|jjd |j|j|j ��q�Wq7n|jS( s� Stop analyzing the current document and come up with a final prediction. :returns: The ``result`` attribute, a ``dict`` with the keys `encoding`, `confidence`, and `language`. sno data received!tasciiRg�?RRRgsiso-8859s no probers hit minimum thresholds%s %s confidence = %sN(R R R+RRtdebugRRRR.R R R4tMINIMUM_THRESHOLDR3tlowerR$Rt ISO_WIN_MAPtgetRtgetEffectiveLevelRtDEBUGR"Rtprobers( Rtprober_confidencetmax_prober_confidencet max_proberR R3tlower_charset_nameRt group_prober((s:/tmp/pip-build-1THPZW/chardet/chardet/universaldetector.pytclose�s`              (Rt __module__t__doc__R;tretcompileR,R/R7R=RtALLRRR1RG(((s:/tmp/pip-build-1THPZW/chardet/chardet/universaldetector.pyR3s"    m(RIR%RRJtcharsetgroupproberRtenumsRRRt escproberRt latin1proberRtmbcsgroupproberRtsbcsgroupproberRtobjectR(((s:/tmp/pip-build-1THPZW/chardet/chardet/universaldetector.pyt$s