B []#@sdZddlmZddlmZddlmZddlmZm Z ddl m Z m Z ddZ Gd d d ZGd d d eZGd ddeZdddZdddZdS)z parquet compat )catch_warnings)import_optional_dependency)AbstractMethodError) DataFrame get_option)get_filepath_or_buffer is_s3_urlcCs|dkrtd}|dkrXytStk r2YnXytStk rNYnXtd|dkrhtd|dkrvtS|dkrtSdS) z return our implementation autozio.parquet.enginez}Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. pyarrow or fastparquet is required for parquet support)pyarrow fastparquetz.engine must be one of 'pyarrow', 'fastparquet'r r N)r PyArrowImpl ImportErrorFastParquetImpl ValueError)enginer5/tmp/pip-install-svzetoqp/pandas/pandas/io/parquet.py get_engine s&rc@s.eZdZdZeddZddZdddZdS) BaseImplNcCsLt|tstd|jjdkr&tdtdd|jjD}|sHtddS)Nz+to_parquet only supports IO with DataFrames>stringunicodeemptyz%parquet must have string column namescss |]}|dk rt|tVqdS)N) isinstancestr).0namerrr ?sz.BaseImpl.validate_dataframe..z!Index level names must be strings)rrrcolumnsZ inferred_typeallindexnames)dfZ valid_namesrrrvalidate_dataframe3s  zBaseImpl.validate_dataframecKs t|dS)N)r)selfr!path compressionkwargsrrrwriteDszBaseImpl.writecKs t|dS)N)r)r#r$rr&rrrreadGsz BaseImpl.read)N)__name__ __module__ __qualname__api staticmethodr"r'r(rrrrr/s rc@s(eZdZddZd ddZd dd ZdS) r cCstddd}ddl}||_dS)Nr z(pyarrow is required for parquet support.)extrar)rZpyarrow.parquetr,)r#r rrr__init__Ls zPyArrowImpl.__init__snappymsNc Ks||t|dd\}}}}|dkr,i} nd|i} |jjj|f| } |dk rp|jjj| |f|||d|n|jjj| |f||d|dS)Nwb)modeZpreserve_index)r%coerce_timestampspartition_cols)r%r4)r"rr,ZTableZ from_pandasparquetZwrite_to_datasetZ write_table) r#r!r$r%r4rr5r&_Zfrom_pandas_kwargstablerrrr'Ts(  zPyArrowImpl.writecKsXt|\}}}}d|d<|jjj|fd|i|}|rTy |Wn YnX|S)NTZuse_pandas_metadatar)rr,r6Z read_table to_pandasclose)r#r$rr&r7Z should_closeresultrrrr(xs zPyArrowImpl.read)r0r1NN)N)r)r*r+r/r'r(rrrrr Ks   r c@s(eZdZddZd ddZd ddZdS) rcCstddd}||_dS)Nr z,fastparquet is required for parquet support.)r.)rr,)r#r rrrr/s zFastParquetImpl.__init__r0Nc Ks||d|kr$|dk r$tdnd|kr6|d}|dk rFd|d<t|rpt|dd\}}}}dd|d <nt|\}}}}td d $|jj||f|||d |WdQRXdS) N partition_onzYCannot use both partition_on and partition_cols. Use partition_cols for partitioning dataZhiveZ file_schemer2)r3cSs|S)Nr)r$r7rrrz'FastParquetImpl.write.. open_withT)record)r%Z write_indexr<)r"rpoprrrr,r')r#r!r$r%rr5r&r7rrrr's(   zFastParquetImpl.writec Kstt|rDddlm}||\}}z|jj||jd}Wd|Xnt|\}}}}|j|}|jfd|i|S)Nr)get_file_and_filesystem)r?r) rZ pandas.io.s3rBr,Z ParquetFileopenr:rr9) r#r$rr&rBZs3 filesystemZ parquet_filer7rrrr(s    zFastParquetImpl.read)r0NN)N)r)r*r+r/r'r(rrrrrs 'rr r0NcKs$t|}|j||f|||d|S)a Write a DataFrame to the parquet format. Parameters ---------- path : str File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset. .. versionchanged:: 0.24.0 engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' Parquet library to use. If 'auto', then the option ``io.parquet.engine`` is used. The default ``io.parquet.engine`` behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable. compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy' Name of the compression to use. Use ``None`` for no compression. index : bool, default None If ``True``, include the dataframe's index(es) in the file output. If ``False``, they will not be written to the file. If ``None``, the engine's default behavior will be used. .. versionadded 0.24.0 partition_cols : list, optional, default None Column names by which to partition the dataset Columns are partitioned in the order they are given .. versionadded:: 0.24.0 kwargs Additional keyword arguments passed to the engine )r%rr5)rr')r!r$rr%rr5r&implrrr to_parquets+rFcKst|}|j|fd|i|S)a Load a parquet object from the file path, returning a DataFrame. .. versionadded 0.21.0 Parameters ---------- path : str, path object or file-like object Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: ``file://localhost/path/to/table.parquet``. If you want to pass in a path object, pandas accepts any ``os.PathLike``. By file-like object, we refer to objects with a ``read()`` method, such as a file handler (e.g. via builtin ``open`` function) or ``StringIO``. engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' Parquet library to use. If 'auto', then the option ``io.parquet.engine`` is used. The default ``io.parquet.engine`` behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable. columns : list, default=None If not None, only these columns will be read from the file. .. versionadded 0.21.1 **kwargs Any additional kwargs are passed to the engine. Returns ------- DataFrame r)rr()r$rrr&rErrr read_parquets%rG)r r0NN)r N)__doc__warningsrZpandas.compat._optionalrZ pandas.errorsrZpandasrrZpandas.io.commonrrrrr rrFrGrrrrs   "=G 0