U Dx`> @szdZddlmZddlZddlZddlmZmZmZm Z m Z m Z ddl m Z ddlmZmZddlmZddlmZdd lmZdd lmZmZmZdd lmZdd lmZmZm Z m!Z!e"d dddZ#d%eeee"e$e ee eefdddZ%Gdd d Z&Gddde&Z'Gddde&Z(eej)ddd&ee ee"e e"e e$ee e e"e e*dd d!Z+d'e"e$d"d#d$Z,dS)(z parquet compat ) LooseVersionN)AnyAnyStrDictListOptionalTuple)catch_warnings)FilePathOrBufferStorageOptions)import_optional_dependencyAbstractMethodError)doc) DataFrame MultiIndex get_option)generic) IOHandles get_handle is_fsspec_urlstringify_pathBaseImpl)enginereturnc Cs|dkrtd}|dkr|ttg}d}|D]D}z |WStk rj}z|dt|7}W5d}~XYq(Xq(td||dkrtS|dkrtStd dS) z return our implementation autozio.parquet.enginez - NzUnable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. A suitable version of pyarrow or fastparquet is required for parquet support. Trying to import the above resulted in these errors:pyarrow fastparquetz.engine must be one of 'pyarrow', 'fastparquet')r PyArrowImplFastParquetImpl ImportErrorstr ValueError)rZengine_classesZ error_msgsZ engine_classerrr%8/tmp/pip-target-zr53vnty/lib/python/pandas/io/parquet.py get_engines$ $ r'rbF)pathfsstorage_optionsmodeis_dirrcCst|}t|r:|dkr:td}|jj|f|p0i\}}n |rFtdd}|s|st|trtj |st ||dd}d}|j }|||fS)zFile handling for PyArrow.Nfsspecz9storage_options passed with buffer or non-fsspec filepathFZis_text) rrr coreZ url_to_fsr# isinstancer"osr)isdirrhandle)r)r*r+r,r-path_or_handler.handlesr%r%r&_get_path_or_handle6s.   r7c@s6eZdZeedddZedddZd ddZdS) r)dfcCsxt|tstdt|jtr>tdd|jjDsRtdn|jjdkrRtdtdd|jj D}|sttddS) Nz+to_parquet only supports IO with DataFramescss|]}|jdkVqdS)>emptystringN) inferred_type).0xr%r%r& asz.BaseImpl.validate_dataframe..z parquet must have string column names for all values in each level of the MultiIndex >r9r:z%parquet must have string column namescss |]}|dk rt|tVqdSN)r1r")r<namer%r%r&r>osz!Index level names must be strings) r1rr#columnsralllevelsr;indexnames)r8Z valid_namesr%r%r&validate_dataframeYs    zBaseImpl.validate_dataframecKs t|dSr?r )selfr8r) compressionkwargsr%r%r&writeuszBaseImpl.writeNcKs t|dSr?r )rGr)rArIr%r%r&readxsz BaseImpl.read)N)__name__ __module__ __qualname__ staticmethodrrFrJrKr%r%r%r&rXsc @sReZdZddZd eeeeeee e ee edddZ d e d d d Z dS)rcCs&tdddddl}ddl}||_dS)Nrz(pyarrow is required for parquet support.extrar)r Zpyarrow.parquetZpandas.core.arrays._arrow_utilsapi)rGrpandasr%r%r&__init__}szPyArrowImpl.__init__snappyN)r8r)rHrDr+partition_colsc Ks||d|ddi}|dk r*||d<|jjj|f|} t||dd|d|dk d\} } |d<zH|dk r|jjj| | f||d|n|jjj | | fd|i|W5| dk r| XdS)NZschemaZpreserve_index filesystemwb)r+r,r-)rHrVrH) rFpoprRZTableZ from_pandasr7closeparquetZwrite_to_datasetZ write_table) rGr8r)rHrDr+rVrIZfrom_pandas_kwargstabler5r6r%r%r&rJsB    zPyArrowImpl.writeFr+c Ks:d|d<i}|rt|jjdkrddl}|j||j||j| |j | |j | |j||j||j||j||j|i }|j|d<ntd|jjdt||dd|d d \} } |d<z&|jjj| fd |i|jf|WS| dk r4| XdS) NTZuse_pandas_metadataz0.16rZ types_mapperzB'use_nullable_dtypes=True' is only supported for pyarrow >= 0.16 (z is installedrWr()r+r,rA) rrR __version__rSZint8Z Int8Dtypeint16Z Int16Dtypeint32Z Int32Dtypeint64Z Int64DtypeZuint8Z UInt8DtypeZuint16Z UInt16DtypeZuint32Z UInt32DtypeZuint64Z UInt64DtypeZbool_Z BooleanDtyper:Z StringDtypegetr#r7rYrZr[Z read_table to_pandas) rGr)rAuse_nullable_dtypesr+rIZto_pandas_kwargspdmappingr5r6r%r%r&rKs`   zPyArrowImpl.read)rUNNN)NFN)rLrMrNrTrr rrr"boolr rrJrKr%r%r%r&r|s& /rc@s6eZdZddZd eedddZd edd d ZdS) r cCstddd}||_dS)Nrz,fastparquet is required for parquet support.rP)r rR)rGrr%r%r&rTs zFastParquetImpl.__init__rUN)r8r+c s||d|kr$|dk r$tdnd|kr6|d}|dk rFd|d<t|}t|rrtdfdd|d<n r~td td d $|jj||f|||d |W5QRXdS) N partition_onzYCannot use both partition_on and partition_cols. Use partition_cols for partitioning dataZhiveZ file_schemer.csj|dfpiS)NrXopenr)_r.r+r%r&s z'FastParquetImpl.write.. open_withz?storage_options passed with file object or non-fsspec file pathT)record)rHZ write_indexrh) rFr#rYrrr r rRrJ)rGr8r)rHrDrVr+rIr%rmr&rJs8   zFastParquetImpl.writer]c  s|dd}|rtdt|}i}d}t|rLtdfdd|d<n*t|trvtj |svt |ddd }|j }|j j |f|}|jfd |i|} |dk r|| S) NrdFzNThe 'use_nullable_dtypes' argument is not supported for the fastparquet enginer.csj|dfpiS)Nr(rirkrmr%r&rn/s z&FastParquetImpl.read..ror(r/rA)rYr#rrr r1r"r2r)r3rr4rRZ ParquetFilercrZ) rGr)rAr+rIrdZparquet_kwargsr6Z parquet_fileresultr%rmr&rK s& zFastParquetImpl.read)rUNNN)NN)rLrMrNrTrr rJrKr%r%r%r&r s  3r r+r]rrU)r8r)rrHrDr+rVrc Ksrt|tr|g}t|}|dkr(tn|} |j|| f||||d||dkrjt| tjsbt| SdSdS)a Write a DataFrame to the parquet format. Parameters ---------- df : DataFrame path : str or file-like object, default None If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function) or io.BytesIO. The engine fastparquet does not accept file-like objects. If path is None, a bytes object is returned. .. versionchanged:: 1.2.0 engine : {{'auto', 'pyarrow', 'fastparquet'}}, default 'auto' Parquet library to use. If 'auto', then the option ``io.parquet.engine`` is used. The default ``io.parquet.engine`` behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable. compression : {{'snappy', 'gzip', 'brotli', None}}, default 'snappy' Name of the compression to use. Use ``None`` for no compression. index : bool, default None If ``True``, include the dataframe's index(es) in the file output. If ``False``, they will not be written to the file. If ``None``, similar to ``True`` the dataframe's index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn't require much space and is faster. Other indexes will be included as columns in the file output. .. versionadded:: 0.24.0 partition_cols : str or list, optional, default None Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string. .. versionadded:: 0.24.0 {storage_options} .. versionadded:: 1.2.0 kwargs Additional keyword arguments passed to the engine Returns ------- bytes if no path argument is provided else None N)rHrDrVr+)r1r"r'ioBytesIOrJAssertionErrorgetvalue) r8r)rrHrDr+rVrIimplZ path_or_bufr%r%r& to_parquetAs&?  rw)rrdcKs t|}|j|f||d|S)a Load a parquet object from the file path, returning a DataFrame. Parameters ---------- path : str, path object or file-like object Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: ``file://localhost/path/to/table.parquet``. A file URL can also be a path to a directory that contains multiple partitioned parquet files. Both pyarrow and fastparquet support paths to directories as well as file URLs. A directory path could be: ``file://localhost/path/to/tables`` or ``s3://bucket/partition_dir`` If you want to pass in a path object, pandas accepts any ``os.PathLike``. By file-like object, we refer to objects with a ``read()`` method, such as a file handle (e.g. via builtin ``open`` function) or ``StringIO``. engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' Parquet library to use. If 'auto', then the option ``io.parquet.engine`` is used. The default ``io.parquet.engine`` behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable. columns : list, default=None If not None, only these columns will be read from the file. use_nullable_dtypes : bool, default False If True, use dtypes that use ``pd.NA`` as missing value indicator for the resulting DataFrame (only applicable for ``engine="pyarrow"``). As new dtypes are added that support ``pd.NA`` in the future, the output with this option will change to use those dtypes. Note: this is an experimental option, and behaviour (e.g. additional support dtypes) may change without notice. .. versionadded:: 1.2.0 **kwargs Any additional kwargs are passed to the engine. Returns ------- DataFrame )rArd)r'rK)r)rrArdrIrvr%r%r& read_parquets3rx)Nr(F)NrrUNNN)rNF)-__doc__Zdistutils.versionrrrr2typingrrrrrrwarningsr Zpandas._typingr r Zpandas.compat._optionalr Z pandas.errorsrZpandas.util._decoratorsrrSrrrZ pandas.corerZpandas.io.commonrrrrr"r'rgr7rrr Z _shared_docsbytesrwrxr%r%r%r&sf       % "$i\ W