U Ɛd_@s0ddlZddlZddlmZGdddeZdS)N) ProbingStatec@sneZdZdZdddZddZeddZd d Zed d Z d dZ e ddZ e ddZ e ddZdS) CharSetProbergffffff?NcCsd|_||_tt|_dSN)_state lang_filterlogging getLogger__name__logger)selfrr \C:\Users\aemmanux\AppData\Local\Temp\pip-target-bnng1y30\lib\python\chardet/charsetprober.py__init__'szCharSetProber.__init__cCs tj|_dSr)r DETECTINGrr r r rreset,szCharSetProber.resetcCsdSrr rr r r charset_name/szCharSetProber.charset_namecCsdSrr )r bufr r rfeed3szCharSetProber.feedcCs|jSr)rrr r rstate6szCharSetProber.statecCsdS)Ngr rr r rget_confidence:szCharSetProber.get_confidencecCstdd|}|S)Ns([-])+ )resub)rr r rfilter_high_byte_only=sz#CharSetProber.filter_high_byte_onlycCs\t}td|}|D]@}||dd|dd}|sL|dkrLd}||q|S)u9 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [€-ÿ] marker: everything else [^a-zA-Z€-ÿ] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. s%[a-zA-Z]*[-]+[a-zA-Z]*[^a-zA-Z-]?Nr) bytearrayrfindallextendisalpha)rfilteredwordsword last_charr r rfilter_international_wordsBs  z(CharSetProber.filter_international_wordscCst}d}d}tt|D]n}|||d}|dkr characters. Also retains English alphabet and high byte characters immediately before occurrences of >. This filter can be applied to all scripts which contain both English characters and extended ASCII characters, but is currently only used by ``Latin1Prober``. Frr>s