B ®@`rã@sædZddlmZddlmZmZddlZddlmZmZm Z m Z ddl Z ddl mZmZddlZddlmZddlmZddlmmmZdd lmZejeejd œd d „ZGd d„dƒZ Gdd„dƒZ!Gdd„deej"ƒZ#dS)a Read SAS7BDAT files Based on code written by Jared Hobbs: https://bitbucket.org/jaredhobbs/sas7bdat See also: https://github.com/BioStatMatt/sas7bdat Partial documentation of the file format: https://cran.r-project.org/package=sas7bdat/vignettes/sas7bdat.pdf Reference for binary data compression: http://collaboration.cmc.ec.gc.ca/science/rpn/biblio/ddj/Website/articles/CUJ/1992/9210/ross/ross.htm é)Úabc)ÚdatetimeÚ timedeltaN)ÚIOÚAnyÚUnionÚcast)ÚEmptyDataErrorÚOutOfBoundsDatetime)Ú get_handle)ÚParser)Ú ReaderBase)Ú sas_datetimesÚunitÚreturncCs^ytj||ddStk rX|dkr6| dd„¡S|dkrL| dd„¡Stdƒ‚YnXd S) aÉ Convert to Timestamp if possible, otherwise to datetime.datetime. SAS float64 lacks precision for more than ms resolution so the fit to datetime.datetime is ok. Parameters ---------- sas_datetimes : {Series, Sequence[float]} Dates or datetimes in SAS unit : {str} "d" if the floats represent dates, "s" for datetimes Returns ------- Series Series of datetime64 dtype or datetime.datetime. z 1960-01-01)rÚoriginÚscSstdddƒt|dS)Ni¨é)Úseconds)rr)Ú sas_float©rú:/tmp/pip-unpacked-wheel-q9tj5l6a/pandas/io/sas/sas7bdat.pyÚ8óz$_convert_datetimes..ÚdcSstdddƒt|dS)Ni¨r)Údays)rr)rrrrr<rzunit must be 'd' or 's'N)ÚpdÚ to_datetimer ÚapplyÚ ValueError)rrrrrÚ_convert_datetimes!s  r c@sBeZdZUeed<eed<eed<eed<eeeedœdd„ZdS) Ú_SubheaderPointerÚoffsetÚlengthÚ compressionÚptype)r"r#r$r%cCs||_||_||_||_dS)N)r"r#r$r%)Úselfr"r#r$r%rrrÚ__init__Hsz_SubheaderPointer.__init__N)Ú__name__Ú __module__Ú __qualname__ÚintÚ__annotations__r'rrrrr!Bs r!c@s†eZdZUeed<eeefed<eeefed<eeefed<eed<eed<eeeefeeefeeefeedœdd „Zd S) Ú_ColumnÚcol_idÚnameÚlabelÚformatÚctyper#)r.r/r0r1r2r#cCs(||_||_||_||_||_||_dS)N)r.r/r0r1r2r#)r&r.r/r0r1r2r#rrrr'Ws z_Column.__init__N) r(r)r*r+r,rÚstrÚbytesr'rrrrr-Os    r-c@seZdZdZdAdd„Zdd„Zdd „Zd d „Zd d „Zdd„Z dd„Z dd„Z e e e dœdd„Z e e dœdd„Zdd„Zdd„Zdd„Zd d!„Zd"d#„Ze e d$œd%d&„Zd'd(„Zd)d*„Zd+d,„Zd-d.„Zd/d0„Zd1d2„Zd3d4„Zd5d6„Zd7d8„Zd9d:„ZdBd;d<„Zd=d>„Z d?d@„Z!dS)CÚSAS7BDATReadera! Read SAS files in SAS7BDAT format. Parameters ---------- path_or_buf : path name or buffer Name of SAS file or file-like object pointing to SAS file contents. index : column identifier, defaults to None Column to use as index. convert_dates : boolean, defaults to True Attempt to convert dates to Pandas datetime values. Note that some rarely used SAS date formats may be unsupported. blank_missing : boolean, defaults to True Convert empty strings to missing values (SAS uses blanks to indicate missing character variables). chunksize : int, defaults to None Return SAS7BDATReader object for iterations, returns chunks with given number of lines. encoding : string, defaults to None String encoding. convert_text : bool, defaults to True If False, text variables are left as raw bytes. convert_header_text : bool, defaults to True If False, header text, including column names, are left as raw bytes. NTc CsÚ||_||_||_||_||_||_||_d|_d|_g|_ g|_ g|_ g|_ g|_ d|_g|_g|_g|_d|_d|_d|_t|ddd|_ttt|jjƒ|_y| ¡| ¡Wntk rÔ| ¡‚YnXdS)Nzlatin-1rrÚrbF)Zis_text)ÚindexÚ convert_datesÚ blank_missingÚ chunksizeÚencodingÚ convert_textÚconvert_header_textÚdefault_encodingr$Úcolumn_names_stringsÚ column_namesÚcolumn_formatsÚcolumnsÚ%_current_page_data_subheader_pointersÚ _cached_pageÚ_column_data_lengthsÚ_column_data_offsetsÚ _column_typesÚ_current_row_in_file_indexZ_current_row_on_page_indexr ÚhandlesrrrÚhandleÚ _path_or_bufÚ_get_propertiesÚ_parse_metadataÚ ExceptionÚclose) r&Z path_or_bufr7r8r9r:r;r<r=rrrr'†s:  zSAS7BDATReader.__init__cCstj|jtjdS)z5Return a numpy int64 array of the column data lengths)Údtype)ÚnpÚasarrayrEÚint64)r&rrrÚcolumn_data_lengths¶sz"SAS7BDATReader.column_data_lengthscCstj|jtjdS)z0Return a numpy int64 array of the column offsets)rP)rQrRrFrS)r&rrrÚcolumn_data_offsetsºsz"SAS7BDATReader.column_data_offsetscCstj|jt d¡dS)zj Returns a numpy character array of the column types: s (string) or d (double) ZS1)rP)rQrRrGrP)r&rrrÚ column_types¾szSAS7BDATReader.column_typescCs|j ¡dS)N)rIrO)r&rrrrOÅszSAS7BDATReader.closecCsú|j d¡|j d¡|_|jdttjƒ…tjkr|tj?¡|_@| tjA|tjB¡}| $d¡|_C|j&rò|jC '|j(pì|j)¡|_C| tjD|tjE¡}| $d¡|_F|j&r2|jF '|j(p,|j)¡|_F| tjG|tjH¡}| $d¡|_I|j&rr|jI '|j(pl|j)¡|_I| tjJ|tjK¡}| $d¡}t|ƒdkr¶| '|j(p®|j)¡|_Ln@| tjM|tjN¡}| $d¡|_L|j&rö|jL '|j(pð|j)¡|_LdS)Nri z'magic number mismatch (not a SAS file?))rrTéFéóú<ú>zunknown (code=ú)ó1Úunixó2ZwindowsÚunknowns i¨rr)rz*The SAS7BDAT file appears to be truncated.)OrKÚseekÚreadrDÚlenÚconstÚmagicrÚ _read_bytesZalign_1_offsetZalign_1_lengthZu64_byte_checker_valueZ align_2_valueÚU64Ú _int_lengthZpage_bit_offset_x64Ú_page_bit_offsetZsubheader_pointer_length_x64Ú_subheader_pointer_lengthZpage_bit_offset_x86Zsubheader_pointer_length_x86Zalign_2_offsetZalign_2_lengthZalign_1_checker_valueZendianness_offsetZendianness_lengthÚ byte_orderZencoding_offsetZencoding_lengthZencoding_namesÚ file_encodingZplatform_offsetZplatform_lengthÚplatformZdataset_offsetZdataset_lengthÚrstripr/r=Údecoder;r>Zfile_type_offsetZfile_type_lengthÚ file_typerÚ _read_floatZdate_created_offsetZdate_created_lengthrZ to_timedeltaZ date_createdZdate_modified_offsetZdate_modified_lengthZ date_modifiedÚ _read_intZheader_size_offsetZheader_size_lengthÚ header_lengthZpage_size_offsetZpage_size_lengthÚ _page_lengthZpage_count_offsetZpage_count_lengthZ _page_countZsas_release_offsetZsas_release_lengthZ sas_releaseZsas_server_type_offsetZsas_server_type_lengthZ server_typeZos_version_number_offsetZos_version_number_lengthÚ os_versionZos_name_offsetZos_name_lengthÚos_nameZos_maker_offsetZos_maker_length)r&Zalign1Zalign2ÚbufZ total_alignÚepochÚxrrrrLÈs°               zSAS7BDATReader._get_propertiescCs*|j|jp dd}|dkr&| ¡t‚|S)Nr)Únrows)rbr:rOÚ StopIteration)r&ÚdarrrÚ__next__Ks zSAS7BDATReader.__next__cCsJ|dkr| ¡tdƒ‚| ||¡}|dkr0dnd}t |j||¡dS)N)rXrWzinvalid float widthrXÚfrr)rOrrfÚstructÚunpackrk)r&r"ÚwidthrwÚfdrrrrqSs  zSAS7BDATReader._read_float)r"rrcCsP|dkr| ¡tdƒ‚| ||¡}dddddœ|}t |j||¡d}|S)N)rérXrWzinvalid int widthÚbÚhÚlÚqr)rOrrfrr€rk)r&r"rrwÚitZivrrrrr\s zSAS7BDATReader._read_int)r"r#cCs|jdkrX|j |¡|j |¡}t|ƒ|krT| ¡d|d›d|d›d}t|ƒ‚|S||t|jƒkrz| ¡tdƒ‚|j|||…SdS)NzUnable to read rz bytes from file position Ú.zThe cached page is too small.)rDrKrarbrcrOr)r&r"r#rwÚmsgrrrrfes    zSAS7BDATReader._read_bytescCsRd}xH|sL|j |j¡|_t|jƒdkr*Pt|jƒ|jkrBtdƒ‚| ¡}qWdS)NFrz2Failed to read a meta data page from the SAS file.)rKrbrtrDrcrÚ_process_page_meta)r&ÚdonerrrrMtszSAS7BDATReader._parse_metadatacCsV| ¡tjtjgtj}|j|kr,| ¡|jtj@}|jtjk}|pT|pT|jgkS)N) Ú_read_page_headerrdÚpage_meta_typeZ page_amd_typeÚpage_mix_typesÚ_current_page_typeÚ_process_page_metadataÚpage_data_typerC)r&ÚptÚ is_data_pageZ is_mix_pagerrrr‹~s   z!SAS7BDATReader._process_page_metacCsX|j}tj|}| |tj¡|_tj|}| |tj¡|_tj |}| |tj ¡|_ dS)N) rirdZpage_type_offsetrrZpage_type_lengthrZblock_count_offsetZblock_count_lengthZ_current_page_block_countZsubheader_count_offsetZsubheader_count_lengthÚ_current_page_subheaders_count)r&Ú bit_offsetZtxrrrr‹s   z SAS7BDATReader._read_page_headercCst|j}xht|jƒD]Z}| tj||¡}|jdkr4q|jtjkrBq|  |j ¡}|  ||j|j ¡}|  ||¡qWdS)Nr)riÚranger•Ú_process_subheader_pointersrdZsubheader_pointers_offsetr#r$Ztruncated_subheader_idÚ_read_subheader_signaturer"Ú_get_subheader_indexr%Ú_process_subheader)r&r–ÚiÚpointerÚsubheader_signatureÚsubheader_indexrrrr‘–s   z%SAS7BDATReader._process_page_metadatacCs`tj |¡}|dkr\|tjkp$|dk}|tjk}|jdkrL|rL|rLtjj}n| ¡t dƒ‚|S)NrrzUnknown subheader signature) rdZsubheader_signature_to_indexÚgetZcompressed_subheader_idZcompressed_subheader_typer$ÚSASIndexÚdata_subheader_indexrOr)r&Ú signaturer$r%r7Úf1Úf2rrrrš§s   z#SAS7BDATReader._get_subheader_index)r"Úsubheader_pointer_indexc Cst|j}|||}| ||j¡}||j7}| ||j¡}||j7}| |d¡}|d7}| |d¡}t||||ƒ} | S)Nr)rjrrrhr!) r&r"r¦Zsubheader_pointer_lengthZ total_offsetZsubheader_offsetZsubheader_lengthZsubheader_compressionZsubheader_typeryrrrr˜³s      z*SAS7BDATReader._process_subheader_pointerscCs| ||j¡}|S)N)rfrh)r&r"ržrrrr™Ész(SAS7BDATReader._read_subheader_signaturecCsÞ|j}|j}|tjjkr |j}n°|tjjkr4|j}nœ|tjjkrH|j }nˆ|tjj kr\|j }nt|tjj krp|j }n`|tjjkr„|j}nL|tjjkr˜|j}n8|tjjkr¬|j}n$|tjjkrÈ|j |¡dStdƒ‚|||ƒdS)Nzunknown subheader index)r"r#rdr¡Zrow_size_indexÚ_process_rowsize_subheaderZcolumn_size_indexÚ_process_columnsize_subheaderZcolumn_text_indexÚ_process_columntext_subheaderZcolumn_name_indexÚ_process_columnname_subheaderZcolumn_attributes_indexÚ#_process_columnattributes_subheaderZformat_and_label_indexÚ_process_format_subheaderZcolumn_list_indexÚ_process_columnlist_subheaderZsubheader_counts_indexÚ_process_subheader_countsr¢rCÚappendr)r&rŸrr"r#Ú processorrrrr›Ís.          z!SAS7BDATReader._process_subheadercCsÒ|j}|}|}|jr&|d7}|d7}n|d7}|d7}| |tj||¡|_| |tj||¡|_| |tj||¡|_ | |tj ||¡|_ tj |}| |||¡|_ | |d¡|_| |d¡|_dS)NiªiÂibizrƒ)rhrgrrrdZrow_length_offset_multiplierZ row_lengthZrow_count_offset_multiplierÚ row_countZcol_count_p1_multiplierÚ col_count_p1Zcol_count_p2_multiplierÚ col_count_p2Z'row_count_on_mix_page_offset_multiplierZ_mix_page_row_countÚ_lcsÚ_lcp)r&r"r#Úint_lenZ lcs_offsetZ lcp_offsetZmxrrrr§és(  z)SAS7BDATReader._process_rowsize_subheadercCsT|j}||7}| ||¡|_|j|j|jkrPtd|j›d|j›d|j›dƒdS)Nz Warning: column count mismatch (z + z != z) )rhrrÚ column_countr²r³Úprint)r&r"r#r¶rrrr¨s z,SAS7BDATReader._process_columnsize_subheadercCsdS)Nr)r&r"r#rrrr®sz(SAS7BDATReader._process_subheader_countsc CsÎ||j7}| |tj¡}| ||¡}|d|… d¡}|}|jrR| |jpN|j ¡}|j   |¡t |j ƒdkrÊd}xtj D]}||krz|}qzW||_||j8}|d} |jr´| d7} | | |j¡}| d¡}|dkrd|_|d} |jrò| d7} | | |j¡}|d|j…|_nŒ|tjkrV|d } |jr6| d7} | | |j¡}|d|j…|_nH|jdkržd|_|d} |jr€| d7} | | |j¡}|d|j…|_|jrÊt|d ƒrÊ|j |jpÄ|j ¡|_dS) Nrs rrérXóé é(Ú creator_proc)rhrrrdZtext_block_size_lengthrfrnr=ror;r>r?r¯rcZcompression_literalsr$rgrµr´r½Zrle_compressionÚhasattr) r&r"r#Ztext_block_sizerwZ cname_rawÚcnameZcompression_literalZclZoffset1rrrr©sX          z,SAS7BDATReader._process_columntext_subheaderc CsÌ|j}||7}|d|dd}x¤t|ƒD]˜}|tj|dtj}|tj|dtj}|tj|dtj}| |tj¡} | |tj ¡} | |tj ¡} |j | } |j   | | | | …¡q,WdS)Nrƒé rWr)rhr—rdZcolumn_name_pointer_lengthZ!column_name_text_subheader_offsetZcolumn_name_offset_offsetZcolumn_name_length_offsetrrZ!column_name_text_subheader_lengthZcolumn_name_offset_lengthZcolumn_name_length_lengthr?r@r¯) r&r"r#r¶Zcolumn_name_pointers_countrœZtext_subheaderZcol_name_offsetZcol_name_lengthÚidxÚ col_offsetZcol_lenZname_strrrrrªHs   z,SAS7BDATReader._process_columnname_subheaderc Csâ|j}|d|d|d}x¾t|ƒD]²}||tj||d}|d|tj||d}|d|tj||d}| ||¡} |j | ¡| |tj ¡} |j  | ¡| |tj ¡} |j  | dkrÔdnd¡q(WdS)NrƒrÀrWródós) rhr—rdZcolumn_data_offset_offsetZcolumn_data_length_offsetZcolumn_type_offsetrrrFr¯Zcolumn_data_length_lengthrEZcolumn_type_lengthrG) r&r"r#r¶Zcolumn_attributes_vectors_countrœZcol_data_offsetZ col_data_lenZ col_typesryrrrr«hs   z2SAS7BDATReader._process_columnattributes_subheadercCsdS)Nr)r&r"r#rrrr­‚sz,SAS7BDATReader._process_columnlist_subheadercCsl|j}|tjd|}|tjd|}|tjd|}|tjd|}|tjd|}|tjd|} | |tj ¡} t | t |j ƒdƒ} | |tj ¡} | |tj¡} | |tj¡}t |t |j ƒdƒ}| |tj¡}| | tj¡}|j |}||||…}|j | }|| | | …}t |jƒ}t||j||||j||j|ƒ}|j |¡|j |¡dS)Nér)rhrdZ)column_format_text_subheader_index_offsetZcolumn_format_offset_offsetZcolumn_format_length_offsetZ(column_label_text_subheader_index_offsetZcolumn_label_offset_offsetZcolumn_label_length_offsetrrZ)column_format_text_subheader_index_lengthÚminrcr?Zcolumn_format_offset_lengthZcolumn_format_length_lengthZ(column_label_text_subheader_index_lengthZcolumn_label_offset_lengthZcolumn_label_length_lengthrBr-r@rGrErAr¯)r&r"r#r¶Ztext_subheader_formatZcol_format_offsetZcol_format_lenZtext_subheader_labelZcol_label_offsetZ col_label_lenryZ format_idxZ format_startZ format_lenZ label_idxZ label_startZ label_lenZ label_namesZ column_labelZ format_namesZ column_formatZcurrent_column_numberÚcolrrrr¬†s@        z(SAS7BDATReader._process_format_subheadercCsî|dkr|jdk r|j}n|dkr(|j}t|jƒdkrF| ¡tdƒ‚|j|jkrVdS|j|j}||krn|}|j d¡}|j d¡}tj ||ft d|_ tj |d|ftj d|_d|_t|ƒ}| |¡| ¡}|jdk rê| |j¡}|S)NrzNo columns to parse from filerÃrÄ)rPrW)r:r±rcrGrOr rHÚcountrQÚemptyÚobjectÚ _string_chunkÚzerosZuint8Ú _byte_chunkÚ_current_row_in_chunk_indexr rbÚ_chunk_to_dataframer7Z set_index)r&rzÚmZndÚnsÚpÚrsltrrrrb·s.       zSAS7BDATReader.readcCs¸g|_|j |j¡|_t|jƒdkr(dSt|jƒ|jkrf| ¡dt|jƒd›d|jd›d}t|ƒ‚| ¡|j }|t j kr†|  ¡|t j @}t j gt j}|s´|j |kr´| ¡SdS)NrTz-failed to read complete page from file (read rz of z bytes)F)rCrKrbrtrDrcrOrrrrdrŽr‘r’rÚ_read_next_page)r&rŠZ page_typer”r“rrrrÔÙs"  zSAS7BDATReader._read_next_pagec Csœ|j}|j}t|||ƒ}tj|d}d\}}xft|jƒD]V}|j|}|j|dkrì|j|dd…fj |j dd||<t j ||t j d||<|jrâ|j|tjkrÀt||dƒ||<n"|j|tjkrât||dƒ||<|d7}q<|j|dkrx|j|dd…f||<|jrD|jdk rD||j |jp<|j¡||<|jrn||j ¡d k} t j|j| |f<|d7}q<| ¡td |j|›ƒ‚qr9rcÚnanÚlocrOr) r&ÚnrÐÚixrÓZjsZjbÚjr/ÚiirrrrÏòs8  $   z"SAS7BDATReader._chunk_to_dataframe)NTTNNTT)N)"r(r)r*Ú__doc__r'rTrUrVrOrLr}rqr+rrrfrMr‹rr‘ršr˜r™r›r§r¨r®r©rªr«r­r¬rbrÔrÏrrrrr5isJ '       4 1 "r5)$rÜÚ collectionsrrrrÚtypingrrrrZnumpyrQZ pandas.errorsr r ZpandasrZpandas.io.commonr Zpandas.io.sas._sasr Zpandas.io.sas.sas_constantsÚioZsasZ sas_constantsrdZpandas.io.sas.sasreaderr ZSeriesr3r r!r-ÚIteratorr5rrrrÚs    !