Hacked By AnonymousFox

Current Path : /opt/cloudlinux/venv/lib/python3.11/site-packages/chardet/__pycache__/
Upload File :
Current File : //opt/cloudlinux/venv/lib/python3.11/site-packages/chardet/__pycache__/universaldetector.cpython-311.pyc

�

�܋f:����dZddlZddlZddlZddlmZmZmZddlm	Z	ddl
mZddlm
Z
mZmZddlmZdd	lmZdd
lmZddlmZddlmZdd
lmZddlmZGd�d��ZdS)a
Module containing the UniversalDetector detector class, which is the primary
class a user of ``chardet`` should use.

:author: Mark Pilgrim (initial port to Python)
:author: Shy Shalom (original C code)
:author: Dan Blanchard (major refactoring for 3.0)
:author: Ian Cordasco
�N)�List�Optional�Union�)�CharSetGroupProber)�
CharSetProber)�
InputState�LanguageFilter�ProbingState)�EscCharSetProber)�Latin1Prober)�MacRomanProber)�MBCSGroupProber)�
ResultDict)�SBCSGroupProber)�
UTF1632Proberc	�X�eZdZdZdZejd��Zejd��Zejd��Z	dddd	d
ddd
d�Z
dddddddd�Zej
dfdededdfd�Zedefd���Zedefd���Zedeefd���Zd!d�Zdeeefddfd�Zdefd �ZdS)"�UniversalDetectoraq
    The ``UniversalDetector`` class underlies the ``chardet.detect`` function
    and coordinates all of the different charset probers.

    To get a ``dict`` containing an encoding and its confidence, you can simply
    run:

    .. code::

            u = UniversalDetector()
            u.feed(some_bytes)
            u.close()
            detected = u.result

    g�������?s[�-�]s(|~{)s[�-�]zWindows-1252zWindows-1250zWindows-1251zWindows-1256zWindows-1253zWindows-1255zWindows-1254zWindows-1257)�
iso-8859-1z
iso-8859-2z
iso-8859-5z
iso-8859-6z
iso-8859-7z
iso-8859-8�
iso-8859-9ziso-8859-13zISO-8859-11�GB18030�CP949�UTF-16)�asciirztis-620r�gb2312zeuc-krzutf-16leF�lang_filter�should_rename_legacy�returnNc� �d|_d|_g|_dddd�|_d|_d|_tj|_d|_	||_
tjt��|_d|_||_|���dS)N���encoding�
confidence�languageF�)�_esc_charset_prober�_utf1632_prober�_charset_probers�result�done�	_got_datar	�
PURE_ASCII�_input_state�
_last_charr�logging�	getLogger�__name__�logger�_has_win_bytesr�reset)�selfrrs   �P/opt/cloudlinux/venv/lib64/python3.11/site-packages/chardet/universaldetector.py�__init__zUniversalDetector.__init__ds���
@D�� �8<���57������#
�#
���
��	����&�1������&����'��1�1���#���$8��!��
�
�����r%c��|jS�N)r-�r5s r6�input_statezUniversalDetector.input_state{s��� � r%c��|jSr9)r3r:s r6�
has_win_byteszUniversalDetector.has_win_bytess���"�"r%c��|jSr9)r(r:s r6�charset_probersz!UniversalDetector.charset_probers�s���$�$r%c�2�dddd�|_d|_d|_d|_tj|_d|_|jr|j�	��|j
r|j
�	��|jD]}|�	���dS)z�
        Reset the UniversalDetector and all of its probers back to their
        initial states.  This is called by ``__init__``, so you only need to
        call this directly in between analyses of different documents.
        Nr r!Fr%)r)r*r+r3r	r,r-r.r&r4r'r()r5�probers  r6r4zUniversalDetector.reset�s���$(�s��M�M�����	����#���&�1�������#�	-��$�*�*�,�,�,���	)�� �&�&�(�(�(��+�	�	�F��L�L�N�N�N�N�	�	r%�byte_strc�p�|jrdS|sdSt|t��st|��}|js�|�t
j��rdddd�|_n�|�t
jt
j	f��rdddd�|_nx|�d��rdddd�|_nW|�d	��rd
ddd�|_n6|�t
j
t
jf��rdddd�|_d|_|jd
�	d|_dS|jtjkrt|j�|��rtj|_nH|jtjkr3|j�|j|z��rtj|_|dd�|_|jst-��|_|jjt0jkr]|j�|��t0jkr5|jj|j���dd�|_d|_dS|jtjkr�|jst?|j ��|_|j�|��t0jkr?|jj|j���|jj!d�|_d|_dSdS|jtjk�r'|j"s�tG|j ��g|_"|j tHj%zr&|j"�&tO����|j"�&tQ����|j"�&tS����|j"D]U}|�|��t0jkr0|j|���|j!d�|_d|_n�V|j*�|��rd|_+dSdSdS)a�
        Takes a chunk of a document and feeds it through all of the relevant
        charset probers.

        After calling ``feed``, you can check the value of the ``done``
        attribute to see if you need to continue feeding the
        ``UniversalDetector`` more data, or if it has made a prediction
        (in the ``result`` attribute).

        .. note::
           You should always call ``close`` when you're done feeding in your
           document if ``done`` is not already ``True``.
        Nz	UTF-8-SIG��?�r!zUTF-32s��zX-ISO-10646-UCS-4-3412s��zX-ISO-10646-UCS-4-2143rTr"���),r*�
isinstance�	bytearrayr+�
startswith�codecs�BOM_UTF8r)�BOM_UTF32_LE�BOM_UTF32_BE�BOM_LE�BOM_BEr-r	r,�HIGH_BYTE_DETECTOR�search�	HIGH_BYTE�ESC_DETECTORr.�	ESC_ASCIIr'r�stater�	DETECTING�feed�FOUND_IT�charset_name�get_confidencer&rrr$r(rr
�NON_CJK�appendrr
r�WIN_BYTE_DETECTORr3)r5rBrAs   r6rWzUniversalDetector.feed�s]���9�	��F��	��F��(�I�.�.�	+� ��*�*�H��~�%	��"�"�6�?�3�3�
X�!,�"%� "������
�$�$�f�&9�6�;N�%O�P�P�
X�,4�3�TV�W�W�����$�$�%8�9�9�
X�!9�"%� "�	������$�$�%8�9�9�
X�!9�"%� "�	������$�$�f�m�V�]�%C�D�D�
X�,4�3�TV�W�W���!�D�N��{�:�&�2� ��	�����
� 5�5�5��&�-�-�h�7�7�
9�$.�$8��!�!��!�Z�%:�:�:��%�,�,�T�_�x�-G�H�H�;�%/�$8��!�"�2�3�3�-����#�	3�#0�?�?�D� ���%��)?�?�?��#�(�(��2�2�l�6K�K�K� $� 4� A�"&�"6�"E�"E�"G�"G� "�����
!��	�����
� 4�4�4��+�
N�+;�D�<L�+M�+M��(��'�,�,�X�6�6�,�:O�O�O� $� 8� E�"&�":�"I�"I�"K�"K� $� 8� A�����
!��	�	�	�
P�O��
�*�"6�
6�
6��(�
?�)8��9I�)J�)J�(K��%��#�n�&<�<�D��)�0�0��1B�1B�C�C�C��%�,�,�\�^�^�<�<�<��%�,�,�^�-=�-=�>�>�>��/�
�
���;�;�x�(�(�L�,A�A�A�$*�$7�&,�&;�&;�&=�&=�$*�O�#�#�D�K�
!%�D�I��E�B��%�,�,�X�6�6�
+�&*��#�#�#�%7�
6�"
+�
+r%c	��|jr|jSd|_|js|j�d���n%|jtjkr
dddd�|_�n|jtjkr�d}d}d}|j	D]#}|s�|�
��}||kr|}|}�$|r�||jkr�|j}|�J�|�
��}|�
��}|�d	��r"|jr|j�||��}|jr/|j�|pd�
��|��}|||jd�|_|j���t,jkr�|jd
��|j�d��|j	D]�}|s�t1|t2��rD|jD];}|j�d|j|j|�
�����<�^|j�d|j|j|�
������|jS)
z�
        Stop analyzing the current document and come up with a final
        prediction.

        :returns:  The ``result`` attribute, a ``dict`` with the keys
                   `encoding`, `confidence`, and `language`.
        Tzno data received!rrDrEr!Nr ziso-8859r"z no probers hit minimum thresholdz%s %s confidence = %s)r*r)r+r2�debugr-r	r,rRr(rZ�MINIMUM_THRESHOLDrY�lowerrIr3�ISO_WIN_MAP�getr�
LEGACY_MAPr$�getEffectiveLevelr/�DEBUGrGr�probers)	r5�prober_confidence�max_prober_confidence�
max_proberrArY�lower_charset_namer#�group_probers	         r6�closezUniversalDetector.closes����9�	��;����	��~�(	��K���1�2�2�2�2��
�*�"7�
7�
7�'.�c�r�R�R�D�K�K��
�*�"6�
6�
6� $��$'�!��J��/�
(�
(�����$*�$9�$9�$;�$;�!�$�'<�<�<�,=�)�!'�J���
�4�t�7M�M�M�)�6��#�/�/�/�%1�%7�%7�%9�%9�"�'�6�6�8�8�
�&�0�0��<�<���*��'+�'7�';�';�.��(�(���,��#'�?�#6�#6�%�+��2�2�4�4�l�$�$�L�!-�",� *� 3������;�(�(�*�*�g�m�;�;��{�:�&�.���!�!�"D�E�E�E�$(�$9���L�'�!� �!�,�0B�C�C��&2�&:���F� �K�-�-� 7� &� 3� &�� &� 5� 5� 7� 7�	�������)�)�3�(�5�(�1�(�7�7�9�9�	�����{�r%)rN)r1�
__module__�__qualname__�__doc__r`�re�compilerPrSr]rbrdr
�ALL�boolr7�property�intr;r=rrr?r4r�bytesrHrWrrm�r%r6rr8s��������� ��#���N�3�3���2�:�l�+�+�L�"��
�>�2�2��$�$�$�$�$�$�$�%�	�	�K� �$� �$������J�'5�&8�%*���#��#��
�	����.�!�S�!�!�!��X�!��#�t�#�#�#��X�#��%��m�!4�%�%�%��X�%�����&A+�U�5�)�#3�4�A+��A+�A+�A+�A+�FM�z�M�M�M�M�M�Mr%r)rprJr/rq�typingrrr�charsetgroupproberr�
charsetproberr�enumsr	r
r�	escproberr�latin1proberr
�macromanproberr�mbcsgroupproberr�
resultdictr�sbcsgroupproberr�
utf1632proberrrrxr%r6�<module>r�sF��8���
�
�
�����	�	�	�	�(�(�(�(�(�(�(�(�(�(�2�2�2�2�2�2�(�(�(�(�(�(�;�;�;�;�;�;�;�;�;�;�'�'�'�'�'�'�&�&�&�&�&�&�*�*�*�*�*�*�,�,�,�,�,�,�"�"�"�"�"�"�,�,�,�,�,�,�(�(�(�(�(�(�r�r�r�r�r�r�r�r�r�rr%

Hacked By AnonymousFox1.0, Coded By AnonymousFox