B @`> @szdZddlmZddlZddlZddlmZmZmZm Z m Z m Z ddl m Z ddlmZmZddlmZddlmZdd lmZdd lmZmZmZdd lmZdd lmZmZm Z m!Z!e"d dddZ#d%eeee"e$e ee eefdddZ%Gdd d Z&Gddde&Z'Gddde&Z(eej)ddd&ee ee"e e"e e$ee e e"e e*dd d!Z+d'e"e$d"d#d$Z,dS)(z parquet compat ) LooseVersionN)AnyAnyStrDictListOptionalTuple)catch_warnings)FilePathOrBufferStorageOptions)import_optional_dependency)AbstractMethodError)doc) DataFrame MultiIndex get_option)generic) IOHandles get_handle is_fsspec_urlstringify_pathBaseImpl)enginereturnc Cs|dkrtd}|dkrzttg}d}xF|D]>}y|Stk rf}z|dt|7}Wdd}~XYq*Xq*Wtd||dkrtS|dkrtStd dS) z return our implementation autozio.parquet.enginez - NzUnable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. A suitable version of pyarrow or fastparquet is required for parquet support. Trying to import the above resulted in these errors:pyarrow fastparquetz.engine must be one of 'pyarrow', 'fastparquet')r PyArrowImplFastParquetImpl ImportErrorstr ValueError)rZengine_classesZ error_msgsZ engine_classerrr$5/tmp/pip-unpacked-wheel-q9tj5l6a/pandas/io/parquet.py get_engines" & r&rbF)pathfsstorage_optionsmodeis_dirrcCst|}t|r:|dkr:td}|jj|f|p0i\}}n |rFtdd}|s|st|trtj |st ||dd}d}|j }|||fS)zFile handling for PyArrow.Nfsspecz9storage_options passed with buffer or non-fsspec filepathF)is_text) rrr coreZ url_to_fsr" isinstancer!osr(isdirrhandle)r(r)r*r+r,path_or_handler-handlesr$r$r%_get_path_or_handle6s   r6c@s6eZdZeedddZedddZd ddZdS) r)dfcCsxt|tstdt|jtr>tdd|jjDsRtdn|jjdkrRtdtdd|jj D}|sttddS) Nz+to_parquet only supports IO with DataFramescss|]}|jdkVqdS)>stringemptyN) inferred_type).0xr$r$r% bsz.BaseImpl.validate_dataframe..z parquet must have string column names for all values in each level of the MultiIndex >r8r9z%parquet must have string column namescss |]}|dk rt|tVqdS)N)r0r!)r;namer$r$r%r=psz!Index level names must be strings) r0rr"columnsralllevelsr:indexnames)r7Z valid_namesr$r$r%validate_dataframeYs   zBaseImpl.validate_dataframecKs t|dS)N)r )selfr7r( compressionkwargsr$r$r%writeuszBaseImpl.writeNcKs t|dS)N)r )rEr(r?rGr$r$r%readxsz BaseImpl.read)N)__name__ __module__ __qualname__ staticmethodrrDrHrIr$r$r$r%rXsc @sReZdZddZd eeeeeee e ee edddZ d e d d d Z dS)rcCs&tdddddl}ddl}||_dS)Nrz(pyarrow is required for parquet support.)extrar)r Zpyarrow.parquetZpandas.core.arrays._arrow_utilsapi)rErpandasr$r$r%__init__}s  zPyArrowImpl.__init__snappyN)r7r(rFrBr*partition_colsc Ks||d|ddi}|dk r*||d<|jjj|f|} t||dd|d|dk d\} } |d<zH|dk r|jjj| | f||d|n|jjj| | fd|i|Wd| dk r| XdS)NZschemaZpreserve_index filesystemwb)r*r+r,)rFrSrF) rDpoprOZTableZ from_pandasr6parquetZwrite_to_datasetZ write_tableclose) rEr7r(rFrBr*rSrGZfrom_pandas_kwargstabler4r5r$r$r%rHs.   zPyArrowImpl.writeF)r*c Ks6d|d<i}|rt|jjdkrddl}|j||j||j| |j | |j | |j||j||j||j||j|i }|j|d<ntd|jjdt||dd|d d \} } |d<z"|jjj| fd |i|jf|S| dk r0| XdS) NTZuse_pandas_metadataz0.16rZ types_mapperzB'use_nullable_dtypes=True' is only supported for pyarrow >= 0.16 (z is installedrTr')r*r+r?) rrO __version__rPZint8Z Int8DtypeZint16Z Int16DtypeZint32Z Int32DtypeZint64Z Int64DtypeZuint8Z UInt8DtypeZuint16Z UInt16DtypeZuint32Z UInt32DtypeZuint64Z UInt64DtypeZbool_Z BooleanDtyper8Z StringDtypegetr"r6rVrWZ read_table to_pandasrX) rEr(r?use_nullable_dtypesr*rGZto_pandas_kwargspdmappingr4r5r$r$r%rIs:   zPyArrowImpl.read)rRNNN)NFN)rJrKrLrQrr rrr!boolr rrHrIr$r$r$r%r|s.(rc@s6eZdZddZd eedddZd edd d ZdS) rcCstddd}||_dS)Nrz,fastparquet is required for parquet support.)rN)r rO)rErr$r$r%rQs zFastParquetImpl.__init__rRN)r7r*c s||d|kr$|dk r$tdnd|kr6|d}|dk rFd|d<t|}t|rrtdfdd|d<n r~td td d $|jj||f|||d |WdQRXdS) N partition_onzYCannot use both partition_on and partition_cols. Use partition_cols for partitioning dataZhiveZ file_schemer-csj|dfpiS)NrU)open)r(_)r-r*r$r%sz'FastParquetImpl.write.. open_withz?storage_options passed with file object or non-fsspec file pathT)record)rFZ write_indexra) rDr"rVrrr r rOrH)rEr7r(rFrBrSr*rGr$)r-r*r%rHs.   zFastParquetImpl.write)r*c  s|dd}|rtdt|}i}d}t|rLtdfdd|d<n*t|trvtj |svt |ddd }|j }|j j |f|}|jfd |i|} |dk r|| S) Nr]FzNThe 'use_nullable_dtypes' argument is not supported for the fastparquet enginer-csj|dfpiS)Nr')rb)r(rc)r-r*r$r%rd/sz&FastParquetImpl.read..rer')r.r?)rVr"rrr r0r!r1r(r2rr3rOZ ParquetFiler\rX) rEr(r?r*rGr]Zparquet_kwargsr5Z parquet_fileresultr$)r-r*r%rI s$ zFastParquetImpl.read)rRNNN)NN)rJrKrLrQrr rHrIr$r$r$r%rs  ,rr*)r*rrR)r7r(rrFrBr*rSrc Ksrt|tr|g}t|}|dkr(tn|} |j|| f||||d||dkrjt| tjsbt| SdSdS)a Write a DataFrame to the parquet format. Parameters ---------- df : DataFrame path : str or file-like object, default None If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function) or io.BytesIO. The engine fastparquet does not accept file-like objects. If path is None, a bytes object is returned. .. versionchanged:: 1.2.0 engine : {{'auto', 'pyarrow', 'fastparquet'}}, default 'auto' Parquet library to use. If 'auto', then the option ``io.parquet.engine`` is used. The default ``io.parquet.engine`` behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable. compression : {{'snappy', 'gzip', 'brotli', None}}, default 'snappy' Name of the compression to use. Use ``None`` for no compression. index : bool, default None If ``True``, include the dataframe's index(es) in the file output. If ``False``, they will not be written to the file. If ``None``, similar to ``True`` the dataframe's index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn't require much space and is faster. Other indexes will be included as columns in the file output. .. versionadded:: 0.24.0 partition_cols : str or list, optional, default None Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string. .. versionadded:: 0.24.0 {storage_options} .. versionadded:: 1.2.0 kwargs Additional keyword arguments passed to the engine Returns ------- bytes if no path argument is provided else None N)rFrBrSr*)r0r!r&ioBytesIOrHAssertionErrorgetvalue) r7r(rrFrBr*rSrGimplZ path_or_bufr$r$r% to_parquetAs ? rm)rr]cKs t|}|j|f||d|S)a Load a parquet object from the file path, returning a DataFrame. Parameters ---------- path : str, path object or file-like object Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: ``file://localhost/path/to/table.parquet``. A file URL can also be a path to a directory that contains multiple partitioned parquet files. Both pyarrow and fastparquet support paths to directories as well as file URLs. A directory path could be: ``file://localhost/path/to/tables`` or ``s3://bucket/partition_dir`` If you want to pass in a path object, pandas accepts any ``os.PathLike``. By file-like object, we refer to objects with a ``read()`` method, such as a file handle (e.g. via builtin ``open`` function) or ``StringIO``. engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' Parquet library to use. If 'auto', then the option ``io.parquet.engine`` is used. The default ``io.parquet.engine`` behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable. columns : list, default=None If not None, only these columns will be read from the file. use_nullable_dtypes : bool, default False If True, use dtypes that use ``pd.NA`` as missing value indicator for the resulting DataFrame (only applicable for ``engine="pyarrow"``). As new dtypes are added that support ``pd.NA`` in the future, the output with this option will change to use those dtypes. Note: this is an experimental option, and behaviour (e.g. additional support dtypes) may change without notice. .. versionadded:: 1.2.0 **kwargs Any additional kwargs are passed to the engine. Returns ------- DataFrame )r?r])r&rI)r(rr?r]rGrlr$r$r% read_parquets3rn)Nr'F)NrrRNNN)rNF)-__doc__Zdistutils.versionrrhr1typingrrrrrrwarningsr Zpandas._typingr r Zpandas.compat._optionalr Z pandas.errorsr Zpandas.util._decoratorsrrPrrrZ pandas.corerZpandas.io.commonrrrrr!r&r`r6rrrZ _shared_docsbytesrmrnr$r$r$r%s>       % $i\$N