Hacked By AnonymousFox

Current Path : /opt/cloudlinux/venv/lib/python3.11/site-packages/chardet/__pycache__/
Upload File :
Current File : //opt/cloudlinux/venv/lib/python3.11/site-packages/chardet/__pycache__/charsetprober.cpython-311.pyc

�

�܋f,��p�ddlZddlZddlmZmZddlmZmZejd��Z	Gd�d��Z
dS)�N)�Optional�Union�)�LanguageFilter�ProbingStates%[a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?c�`�eZdZdZejfdeddfd�Zdd�Zede	e
fd���Zede	e
fd���Zd	e
eefdefd
�Zedefd���Zdefd�Zed
e
eefdefd���Zed
e
eefdefd���Zed
e
eefdefd���ZdS)�
CharSetProbergffffff�?�lang_filter�returnNc��tj|_d|_||_tjt��|_dS)NT)	r�	DETECTING�_state�activer
�logging�	getLogger�__name__�logger)�selfr
s  �L/opt/cloudlinux/venv/lib64/python3.11/site-packages/chardet/charsetprober.py�__init__zCharSetProber.__init__,s1��"�,������&����'��1�1�����c�(�tj|_dS�N)rr
r�rs r�resetzCharSetProber.reset2s��"�,����rc��dSr�rs r�charset_namezCharSetProber.charset_name5s���trc��t�r��NotImplementedErrorrs r�languagezCharSetProber.language9s��!�!r�byte_strc��t�rr )rr#s  r�feedzCharSetProber.feed=s��!�!rc��|jSr)rrs r�statezCharSetProber.state@s
���{�rc��dS)Ngrrs r�get_confidencezCharSetProber.get_confidenceDs���sr�bufc�2�tjdd|��}|S)Ns([-])+� )�re�sub)r*s r�filter_high_byte_onlyz#CharSetProber.filter_high_byte_onlyGs���f�&��c�2�2���
rc��t��}t�|��}|D]Z}|�|dd���|dd�}|���s|dkrd}|�|���[|S)u7
        We define three types of bytes:
        alphabet: english alphabets [a-zA-Z]
        international: international characters [€-ÿ]
        marker: everything else [^a-zA-Z€-ÿ]
        The input buffer can be thought to contain a series of words delimited
        by markers. This function works to filter all words that contain at
        least one international character. All contiguous sequences of markers
        are replaced by a single space ascii character.
        This filter applies to all scripts which do not use English characters.
        N�����r,)�	bytearray�INTERNATIONAL_WORDS_PATTERN�findall�extend�isalpha)r*�filtered�words�word�	last_chars     r�filter_international_wordsz(CharSetProber.filter_international_wordsLs����;�;��
,�3�3�C�8�8���
	'�
	'�D��O�O�D��"��I�&�&�&��R�S�S�	�I��$�$�&�&�
!�9�w�+>�+>� �	��O�O�I�&�&�&�&��rc�v�t��}d}d}t|���d��}t|��D]U\}}|dkr|dz}d}�|dkr<||kr4|s2|�|||���|�d��d}�V|s|�||d	���|S)
a[
        Returns a copy of ``buf`` that retains only the sequences of English
        alphabet and high byte characters that are not between <> characters.
        This filter can be applied to all scripts which contain both English
        characters and extended ASCII characters, but is currently only used by
        ``Latin1Prober``.
        Fr�c�>r�<r,TN)r3�
memoryview�cast�	enumerater6)r*r8�in_tag�prev�curr�buf_chars      r�remove_xml_tagszCharSetProber.remove_xml_tagsns����;�;��������o�o�"�"�3�'�'��'��n�n�	�	�N�D�(��4����a�x������T�!�!��$�;�;�v�;��O�O�C��T�	�N�3�3�3��O�O�D�)�)�)�����	(�
�O�O�C����J�'�'�'��r)rN)r�
__module__�__qualname__�SHORTCUT_THRESHOLDr�NONErr�propertyr�strrr"r�bytesr3rr%r'�floatr)�staticmethodr/r<rHrrrr	r	(s���������5C�5H�2�2�N�2�T�2�2�2�2�-�-�-�-���h�s�m�����X���"�(�3�-�"�"�"��X�"�"�U�5�)�#3�4�"��"�"�"�"���|�����X����������5��	�)9�#:��u�����\�����e�Y�.>�(?��I�����\��B�$�U�5�)�#3�4�$��$�$�$��\�$�$�$rr	)rr-�typingrr�enumsrr�compiler4r	rrr�<module>rUs���:����	�	�	�	�"�"�"�"�"�"�"�"�/�/�/�/�/�/�/�/�(�b�j�8����
k�k�k�k�k�k�k�k�k�kr

Hacked By AnonymousFox1.0, Coded By AnonymousFox