U Dx`\@s ddlZddlZddlmZmZmZmZmZm Z m Z m Z m Z m Z mZmZmZmZmZddlmZGdddejZdZGdddejZGdd d ejZGd d d ejZd d ZdddddZdee_ ddZ!dddddZ"dee"_ d ddZ#dddddZ$ddddZ%dS)!N)IpcWriteOptions ReadStats WriteStatsMessage MessageReaderRecordBatchReader_ReadPandasMixinMetadataVersion read_messageread_record_batch read_schema read_tensor write_tensorget_record_batch_sizeget_tensor_sizec@seZdZdZddZdS)RecordBatchStreamReaderz Reader for the Arrow streaming binary format. Parameters ---------- source : bytes/buffer-like, pyarrow.NativeFile, or file-like Python object Either an in-memory buffer, or a readable file object. cCs||dS)N_open)selfsourcer2/tmp/pip-target-oguziej0/lib/python/pyarrow/ipc.py__init__,sz RecordBatchStreamReader.__init__N__name__ __module__ __qualname____doc__rrrrrr"s raNParameters ---------- sink : str, pyarrow.NativeFile, or file-like Python object Either a file path, or a writable file object. schema : pyarrow.Schema The Arrow schema for data to be written to the file. options : pyarrow.ipc.IpcWriteOptions Options for IPC serialization. If None, default values will be used: the legacy format will not be used unless overridden by setting the environment variable ARROW_PRE_0_15_IPC_FORMAT=1, and the V5 metadata version will be used unless overridden by setting the environment variable ARROW_PRE_1_0_METADATA_VERSION=1. use_legacy_format : bool, default None Deprecated in favor of setting options. Cannot be provided with options. If None, False will be used unless this default is overridden by setting the environment variable ARROW_PRE_0_15_IPC_FORMAT=1c@s&eZdZdeZdddddZdS)RecordBatchStreamWriterz0Writer for the Arrow streaming binary format {}Nuse_legacy_formatoptionscCst||}|j|||ddSN)r!_get_legacy_format_defaultrrsinkschemar r!rrrrLs z RecordBatchStreamWriter.__init__rrrformat_ipc_writer_class_docrrrrrrrGsrc@seZdZdZdddZdS)RecordBatchFileReadera Class for reading Arrow record batch data from the Arrow binary file format Parameters ---------- source : bytes/buffer-like, pyarrow.NativeFile, or file-like Python object Either an in-memory buffer, or a readable file object footer_offset : int, default None If the file is embedded in some larger file, this is the byte offset to the very end of the file data NcCs|j||ddS)N footer_offsetr)rrr-rrrr^szRecordBatchFileReader.__init__)Nrrrrrr+Qs r+c@s&eZdZdeZdddddZdS)RecordBatchFileWriterz1Writer to create the Arrow binary file format {}NrcCst||}|j|||ddSr"r#r%rrrrhs zRecordBatchFileWriter.__init__r(rrrrr.bsr.cCs|dk r|dk rtdn$|r>t|ts:tdt||Stj}|dkrbtt t j dd}tt t j ddr~tj }t||dS)Nz8Can provide at most one of options and use_legacy_formatz expected IpcWriteOptions, got {}ZARROW_PRE_0_15_IPC_FORMAT0ZARROW_PRE_1_0_METADATA_VERSION)r metadata_version) ValueError isinstancer TypeErrorr)typer ZV5boolintosenvirongetZV4)r r!r0rrrr$ms& r$rcCst||||dSNr)rr&r'r r!rrr new_streamsr<z7Create an Arrow columnar IPC stream writer instance {}cCst|S)a Create reader for Arrow streaming format. Parameters ---------- source : bytes/buffer-like, pyarrow.NativeFile, or file-like Python object Either an in-memory buffer, or a readable file object. Returns ------- reader : RecordBatchStreamReader )r)rrrr open_streams r=cCst||||dSr:)r.r;rrrnew_filesr>z5Create an Arrow columnar IPC file writer instance {}cCs t||dS)a Create reader for Arrow file format. Parameters ---------- source : bytes/buffer-like, pyarrow.NativeFile, or file-like Python object Either an in-memory buffer, or a readable file object. footer_offset : int, default None If the file is embedded in some larger file, this is the byte offset to the very end of the file data. Returns ------- reader : RecordBatchFileReader r,)r+)rr-rrr open_filesr?nthreadspreserve_indexc CsFtjj|||d}t}t||j}||W5QRX|S)a Serialize a pandas DataFrame into a buffer protocol compatible object. Parameters ---------- df : pandas.DataFrame nthreads : int, default None Number of threads to use for conversion to Arrow, default all CPUs. preserve_index : bool, default None The default of None will store the index as a column, except for RangeIndex which is stored as metadata only. If True, always preserve the pandas index data as a column. If False, no index information is saved and the result will have a default RangeIndex. Returns ------- buf : buffer An object compatible with the buffer protocol. r@)paZ RecordBatchZ from_pandasZBufferOutputStreamrr'Z write_batchgetvalue)ZdfrArBbatchr&writerrrrserialize_pandass rGT use_threadsc Cs4t|}t|}|}W5QRX|j|dS)aQDeserialize a buffer protocol compatible object into a pandas DataFrame. Parameters ---------- buf : buffer An object compatible with the buffer protocol. use_threads: bool, default True Whether to parallelize the conversion using multiple threads. Returns ------- df : pandas.DataFrame rH)rCZ BufferReaderrZread_allZ to_pandas)bufrIZ buffer_readerreadertablerrrdeserialize_pandass  rM)N)&r7ZpyarrowrCZ pyarrow.librrrrrrrr r r r r rrrlibZ_RecordBatchStreamReaderrr*Z_RecordBatchStreamWriterrZ_RecordBatchFileReaderr+Z_RecordBatchFileWriterr.r$r<r)rr=r>r?rGrMrrrrs*D