B ©[ð]J§ã @s˜dZddlmZddlZddlmZddlZddlZddlZddl Z ddl m Z ddl Z ddlmZddlmZddlmZmZdd lmZmZmZdd lmZmZmZmZmZmZm Z m!Z!dd l"m#Z#dd l$m%Z%dd l&m'Z'm(Z(m)Z)dZ*dZ+dZ,dZ-dZ.dZ/de+e,e-e.e/fZ0de+e-fZ1de+e-fZ2de+e-e,e.fZ3ee0ƒedddedddd\dd„ƒƒƒZ4d d!d"d#d$d%d&d'd(g Z5e d)d*d*¡Z6d+d,„Z7d-d.„Z8d/Z9Gd0d1„d1e:ƒZ;d2ZGd6d7„d7e:ƒZ?d8Z@d9d:„ZAGd;d<„d<ƒZBGd=d>„d>ƒZCGd?d@„d@ƒZDGdAdB„dBeDe'ƒZEdCdD„ZFdEdF„ZGdGdH„ZHdIdJ„ZIdKdL„ZJdMdN„ZKd]dPdQ„ZLGdRdS„dSeDƒZMdTdU„ZNdVdW„ZOGdXdY„dYƒZPGdZd[„d[eMƒZQdS)^a¯ Module contains tools for processing Stata files into DataFrames The StataReader below was originally written by Joe Presbrey as part of PyDTA. It has been extended and improved by Skipper Seabold from the Statsmodels project who also developed the StataWriter and was finally added to pandas in a once again improved version. You can find more information on http://presbrey.mit.edu/PyDTA and http://www.statsmodels.org/devel/ é)Ú OrderedDictN)ÚBytesIO)Ú relativedelta)Ú infer_dtype)Úmax_len_string_array)ÚAppenderÚdeprecate_kwarg)Ú ensure_objectÚis_categorical_dtypeÚis_datetime64_dtype)Ú CategoricalÚ DatetimeIndexÚNaTÚ TimestampÚconcatÚisnaÚ to_datetimeÚ to_timedelta)Ú DataFrame)ÚSeries)Ú BaseIteratorÚ_stringify_pathÚget_filepath_or_bufferz˜Version of given Stata file is not 104, 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), or 118 (Stata 14)zÚconvert_dates : boolean, defaults to True Convert date variables to DataFrame time values. convert_categoricals : boolean, defaults to True Read value labels and convert columns to Categorical/Factor variables.zcencoding : string, None or encoding Encoding used to parse the files. None defaults to latin-1.a6index_col : string, optional, default: None Column to set as index. convert_missing : boolean, defaults to False Flag indicating whether to convert missing values to their Stata representations. If False, missing values are replaced with nan. If True, columns containing missing values are returned with object data types and missing values are represented by StataMissingValue objects. preserve_dtypes : boolean, defaults to True Preserve Stata datatypes. If False, numeric data are upcast to pandas default types for foreign data (float64 or int64). columns : list or None Columns to retain. Columns will be returned in the given order. None returns all columns. order_categoricals : boolean, defaults to True Flag indicating whether converted categorical data are ordered.zzchunksize : int, default None Return StataReader object for iterations, returns chunks with given number of lines.z@iterator : boolean, default False Return StataReader object.aâ Read Stata file into DataFrame. Parameters ---------- filepath_or_buffer : str, path object or file-like object Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: ``file://localhost/path/to/table.dta``. If you want to pass in a path object, pandas accepts any ``os.PathLike``. By file-like object, we refer to objects with a ``read()`` method, such as a file handler (e.g. via builtin ``open`` function) or ``StringIO``. %s %s %s %s %s Returns ------- DataFrame or StataReader See Also -------- io.stata.StataReader : Low-level reader for Stata data files. DataFrame.to_stata: Export Stata data files. Examples -------- Read a Stata dta file: >>> df = pd.read_stata('filename.dta') Read a Stata dta file in 10,000 line chunks: >>> itr = pd.read_stata('filename.dta', chunksize=10000) >>> for chunk in itr: ... do_something(chunk) zÃRead observations from Stata file, converting them into a dataframe .. deprecated:: This is a legacy method. Use `read` in new code. Parameters ---------- %s %s Returns ------- DataFrame zÎReads observations from Stata file, converting them into a dataframe Parameters ---------- nrows : int Number of lines to read from data file, if None read whole file. %s %s Returns ------- DataFrame a.Class for reading Stata dta files. Parameters ---------- path_or_buf : path (string), buffer or path object string, path object (pathlib.Path or py._path.local.LocalPath) or object implementing a binary read() functions. .. versionadded:: 0.23.0 support for pathlib, py.path. %s %s %s %s Úencoding)Ú old_arg_nameÚ new_arg_nameÚindexÚ index_colTFc CsDt||||||||| d } | s"| r(| } nz |  ¡} Wd|  ¡X| S)N)Ú convert_datesÚconvert_categoricalsrÚconvert_missingÚpreserve_dtypesÚcolumnsÚorder_categoricalsÚ chunksize)Ú StataReaderÚreadÚclose) Zfilepath_or_bufferrrrrr r!r"r#r$ÚiteratorÚreaderÚdata©r+ú3/tmp/pip-install-svzetoqp/pandas/pandas/io/stata.pyÚ read_stataÉs   r-z%tcz%tCz%tdz%dz%twz%tmz%tqz%thz%tyi¨écsVtjjtjj‰‰tjt ddd¡j‰tjt ddd¡j‰ˆddd‰ˆddd‰‡‡fdd„}‡‡fdd „}‡‡‡‡fd d „}t |¡}d }| ¡r¶d }t |ƒ}d||<|  tj ¡}|  d¡rät }|} ||| dƒ} n`|  d¡rt d¡t |tjd} |rt| |<| S|  d¡r>t }|} ||| dƒ} n|  d¡rpt j|d} |dd} || | ƒ} nÔ|  d¡r¢t j|d} |dd} || | ƒ} n¢|  d¡rØt j|d} |ddd} || | ƒ} nl|  d¡rt j|d} |dd d} || | ƒ} n6|  d!¡r4|} t |¡} || | ƒ} ntd"j|d#ƒ‚|rRt| |<| S)$af Convert from SIF to datetime. http://www.stata.com/help.cgi?datetime Parameters ---------- dates : Series The Stata Internal Format date to convert to datetime according to fmt fmt : str The format to convert to. Can be, tc, td, tw, tm, tq, th, ty Returns Returns ------- converted : Series The converted dates Examples -------- >>> dates = pd.Series([52]) >>> _stata_elapsed_date_to_datetime_vec(dates , "%tw") 0 1961-01-01 dtype: datetime64[ns] Notes ----- datetime/c - tc milliseconds since 01jan1960 00:00:00.000, assuming 86,400 s/day datetime/C - tC - NOT IMPLEMENTED milliseconds since 01jan1960 00:00:00.000, adjusted for leap seconds date - td days since 01jan1960 (01jan1960 = 0) weekly date - tw weeks since 1960w1 This assumes 52 weeks in a year, then adds 7 * remainder of the weeks. The datetime value is the start of the week in terms of days in the year, not ISO calendar weeks. monthly date - tm months since 1960m1 quarterly date - tq quarters since 1960q1 half-yearly date - th half-years since 1960h1 yearly date - ty years since 0000 If you don't have pandas with datetime support, then you can't do milliseconds accurately. i¨r.éiiècsX| ¡ˆkr,| ¡ˆkr,td||ddSt|ddƒ}tdd„t||ƒDƒ|dSdS) zú Convert year and month to datetimes, using pandas vectorized versions when the date range falls within the range supported by pandas. Otherwise it falls back to a slower but more robust method using datetime. édz%Y%m)ÚformatrNcSsg|]\}}t ||d¡‘qS)r.)Údatetime)Ú.0ÚyÚmr+r+r,ú 9szX_stata_elapsed_date_to_datetime_vec..convert_year_month_safe..)r)ÚmaxÚminrÚgetattrrÚzip)ÚyearÚmonthr)ÚMAX_YEARÚMIN_YEARr+r,Úconvert_year_month_safe-s  zD_stata_elapsed_date_to_datetime_vec..convert_year_month_safecsd| ¡ˆdkr4| ¡ˆkr4t|ddt|ddSt|ddƒ}dd „t||ƒDƒ}t||d SdS) z{ Converts year (e.g. 1999) and days since the start of the year to a datetime or datetime64 Series r.z%Y)r1Úd)ÚunitrNcSs,g|]$\}}t |dd¡tt|ƒd‘qS)r.)Údays)r2rÚint)r3r4r@r+r+r,r6FszW_stata_elapsed_date_to_datetime_vec..convert_year_days_safe..)r)r7r8rrr9r:r)r;rBrÚvalue)r=r>r+r,Úconvert_year_days_safe<s  zC_stata_elapsed_date_to_datetime_vec..convert_year_days_safecs°t|ddƒ}|dkrL| ¡ˆks,| ¡ˆkr”‡fdd„|Dƒ}t||dSnH|dkrŒ| ¡ˆksl| ¡ˆkr”‡fdd„|Dƒ}t||dSntd ƒ‚tˆƒ‰t||d }ˆ|S) z¾ Convert base dates and deltas to datetimes, using pandas vectorized versions if the deltas satisfy restrictions required to be expressed as dates in pandas. rNr@csg|]}ˆtt|ƒd‘qS))rB)rrC)r3r@)Úbaser+r,r6TszS_stata_elapsed_date_to_datetime_vec..convert_delta_safe..)rÚmscs"g|]}ˆtt|ƒdd‘qS)iè)Ú microseconds)rrC)r3r@)rFr+r,r6Yszformat not understood)rA)r9r7r8rÚ ValueErrorrr)rFZdeltasrArÚvalues)Ú MAX_DAY_DELTAÚ MAX_MS_DELTAÚ MIN_DAY_DELTAÚ MIN_MS_DELTA)rFr,Úconvert_delta_safeKs  z?_stata_elapsed_date_to_datetime_vec..convert_delta_safeFTgð?)z%tcÚtcrG)z%tCÚtCz9Encountered %tC format. Leaving in Stata Internal Format.)Údtype)z%tdÚtdz%dr@r@)z%twÚtwé4é)z%tmÚtmé )z%tqÚtqéé)z%thÚthéé)z%tyÚtyzDate fmt {fmt} not understood)Úfmt)rr8r;r7r2rBÚnpÚisnanÚanyrÚastypeÚint64Ú startswithÚ stata_epochÚwarningsÚwarnÚobjectrZ ones_likerIr1)Údatesr`r?rErOZbad_locsZhas_bad_valuesZdata_colrFrGÚ conv_datesrBr;r<r+)rKrLr=rMrNr>r,Ú#_stata_elapsed_date_to_datetime_vecösj1                   rmcsò|j‰d‰ˆd‰d"‡‡‡fdd„ }t|ƒ}|j‰| ¡r`t|ƒ}t|ƒrXttƒ||<nt||<|dkr‚||dd}|jd}n>|d krœt  d ¡|}n$|d kr¾||dd}|jˆ}n|d krð||ddd }d|j tj |j d}nÐ|dkr"||dd}d|j tj |j d}nž|dkrX||dd}d|j tj |j dd}nh|dkr’||dd}d|j tj |j dk  tj¡}n.|dkr°||dd}|j }ntdj|dƒ‚t|tjd}t dd¡d }|||<t|ˆd!S)#aW Convert from datetime to SIF. http://www.stata.com/help.cgi?datetime Parameters ---------- dates : Series Series or array containing datetime.datetime or datetime64[ns] to convert to the Stata Internal Format given by fmt fmt : str The format to convert to. Can be, tc, td, tw, tm, tq, th, ty lž"R:ièFcs@i}t|jƒrŒ|r0|t}|j tj¡d|d<|s8|rVt|ƒ}|j|j|d<|d<|rŠ| tj¡t |ddd tj¡}|ˆ|d<n¨t |dd d kr,|rÎ|jt}‡fd d „}t  |¡}||ƒ|d<|r|  d d „¡}|jd|d<|j|dd|d<|r4dd „}t  |¡}||ƒ|d<nt dƒ‚t|ˆdS)NièÚdeltar;r<z%Y)r1rBF)Úskipnar2csˆ|jd|j|jS)Ni@B)rBÚsecondsrH)Úx)Ú US_PER_DAYr+r,Ú½ózJ_datetime_to_stata_elapsed_vec..parse_dates_safe..cSsd|j|jS)Nr0)r;r<)rqr+r+r,rsÁrtr0cSs|t |jdd¡jS)Nr.)r2r;rB)rqr+r+r,rsÅrtzZColumns containing dates must contain either datetime64, datetime.datetime or null values.)r)r rJrgrdrarer r;r<rrZ vectorizeÚapplyrIr)rkrnr;rBr@ÚfÚvZ year_month)Ú NS_PER_DAYrrrr+r,Úparse_dates_safe«s<        z8_datetime_to_stata_elapsed_vec..parse_dates_safe)z%tcrPT)rn)z%tCrQz'Stata Internal Format tC not supported.)z%tdrS)z%twrT)r;rBrUrV)z%tmrW)r;rXr.)z%tqrYrZr[)z%thr\r]r^)z%tyr_z-Format {fmt} is not a known Stata date format)r`)rRz||  ¡d ks(|| ¡d kr–|| tj¡||<qh|tjkr„||  ¡d ksn|| ¡dkr–|| tj¡||<qh|tjkr ||  ¡dkrÊ|| ¡dkrÊ|| tj¡||<n@|| tj¡||<||  ¡d ks|| ¡dkr–td}qh|tjtjfkrh||  ¡}t |¡rDtdj|dƒ‚|tjkrp||krp|| tj¡||<qh|tjkrh||krhtdj|||dƒ‚qhW|r¬t |t¡|S)a(Checks the dtypes of the columns of a pandas DataFrame for compatibility with the data types and ranges supported by Stata, and converts if necessary. Parameters ---------- data : DataFrame The DataFrame to check and convert Notes ----- Numeric columns in Stata must be one of int8, int16, int32, float32 or float64, with some additional value restrictions. int8 and int16 columns are checked for violations of the value restrictions and upcast if needed. int64 data is not usable in Stata, and so it is downcast to int32 whenever the value are in the int32 range, and sidecast to float64 when larger than this range. If the int64 values are outside of the range of those perfectly representable as float64 values, a warning is raised. bool columns are cast to int8. uint columns are converted to int of the same size if there is no loss in precision, otherwise are upcast to a larger type. uint64 is currently not supported since it is concerted to object in a DataFrame. Úz.)Úkeyrr.i}zaStata value labels for a single variable must have a combined length less than 32,000 characters.)rRérZ)ÚnameÚlabnameÚcatÚ categoriesÚlistr:raÚarangeÚlenÚ value_labelsÚsortrŽÚtext_lenÚoffrˆÚtxtÚnÚ isinstanceÚstrrhriÚvalue_label_mismatch_docr1rƒÚappendrIÚarray)ÚselfZcatarrayr›ÚvlÚcategoryr+r+r,Ú__init__›s6      zStataValueLabel.__init__cCs | |j¡S)z- Python 3 compatibility shim )ÚencodeÚ _encoding)rªÚsr+r+r,Ú_encodeÆszStataValueLabel._encodec Cs&||_tƒ}d}d}| t |d|j¡¡| t|jdd…dƒ¡}| |¡x"t dƒD]}| t d|¡¡qZW| t |d|j ¡¡| t |d|j ¡¡x$|j D]}| t |d|¡¡q¬Wx$|j D]} | t |d| ¡¡qÒWx"|jD]} | | | |¡¡qøW| d ¡| ¡S) a Parameters ---------- byteorder : str Byte order of the output encoding : str File encoding Returns ------- value_label : bytes Bytes containing the formatted value label úóÚiNé é!r[Úcr)r¯rÚwriter{Úpackržr±Ú _pad_bytesr™Úranger¤r¡r¢rˆr£Úseekr&) rªÚ byteorderrÚbioZ null_stringÚ null_byter™r´ÚoffsetrDÚtextr+r+r,Úgenerate_value_labelÌs&     z$StataValueLabel.generate_value_labelN)r€rr‚Ú__doc__r­r±rÂr+r+r+r,r•…s+r•c@sÜeZdZdZiZdZx@eD]8Zdee<x*eddƒD]Zde deƒeee<q2WqWdZ e   dd ¡d Z xpedƒD]dZe   d e ¡d Zdee<ed kr°eee deƒ7<e   de  d e¡¡d e Ze  de¡Z qtWd Ze   d d¡d Z xredƒD]fZe   de¡d Zdee<ed kr8eee deƒ7<e   d e  de¡¡d e Ze  d e¡ZqúWddde   d e ¡d e   de¡d dœZdd„Zedd„ddZedd„ddZdd„Zdd„Zd d!„Zed"d#„ƒZd$S)%ÚStataMissingValueaw An observation's missing value. Parameters ---------- value : int8, int16, int32, float32 or float64 The Stata missing value code Attributes ---------- string : string String representation of the Stata missing value value : int8, int16, int32, float32 or float64 The original encoded missing value Notes ----- More information: Integer missing values make the code '.', '.a', ..., '.z' to the ranges 101 ... 127 (for int8), 32741 ... 32767 (for int16) and 2147483621 ... 2147483647 (for int32). Missing values for floating point data types are more complex but the pattern is simple to discern from the following table. np.float32 missing values (float in Stata) 0000007f . 0008007f .a 0010007f .b ... 00c0007f .x 00c8007f .y 00d0007f .z np.float64 missing values (double in Stata) 000000000000e07f . 000000000001e07f .a 000000000002e07f .b ... 000000000018e07f .x 000000000019e07f .y 00000000001ae07f .z )éeiåiåÿÿÚ.r.éé`szz>The Stata representation of the missing value: '.', '.a'..'.z')ÚdoccCs|jS)N)rÊ)rªr+r+r,rs^rtz/The binary representation of the missing value.cCs|jS)N)Ústring)rªr+r+r,Ú__str__aszStataMissingValue.__str__cCsdj|j|dS)Nz {cls}({obj}))ÚclsÚobj)r1Ú __class__)rªr+r+r,Ú__repr__dszStataMissingValue.__repr__cCs$t||jƒo"|j|jko"|j|jkS)N)r¥rÓrÏrD)rªÚotherr+r+r,Ú__eq__hs  zStataMissingValue.__eq__cCsz|tjkr|jd}n`|tjkr,|jd}nJ|tjkrB|jd}n4|tjkrX|jd}n|tjkrn|jd}ntdƒ‚|S)Nr‹rrŽrrzzUnsupported dtype)rar‹ÚBASE_MISSING_VALUESrrŽrrzrI)rÑrRrDr+r+r,Úget_base_missing_valueos          z(StataMissingValue.get_base_missing_valueN)r€rr‚rÃrÌÚbasesÚbr»r´ÚchrZ float32_baser{r|Ú incrementrDr¹Ú int_valueZ float64_baser×r­ÚpropertyrÏrÐrÔrÖÚ classmethodrØr+r+r+r,rÄsP*   rÄc@seZdZdd„ZdS)Ú StataParserc Cs’ttttddƒdd„tddƒDƒƒƒdtjfdtjfdtjfdtjfd tj fgƒ|_ td tj fd tj fd tjfd tjfdtjfdtjfgƒ|_ ttdƒƒtdƒ|_ tddddddgƒ|_d}d}d}d}dddt t d|¡d¡t t d|¡d¡ft  t d |¡d¡t  t d |¡d¡fd!œ|_ddddd"œ|_d#d$d%t t dd&¡d¡t  t d d'¡d¡d!œ|_d(d)d*d+d,d-d.œ|_d/|_dS)0Nr.éõcSsg|]}dt|ƒ‘qS)Úa)r¦)r3r´r+r+r,r6“sz(StataParser.__init__..éûéüéýéþéÿi€iöÿi÷ÿiøÿiùÿiúÿZbhlfd)i€ÚQ)iöÿr@)i÷ÿrv)iøÿÚl)iùÿÚh)iúÿrÚsÿÿÿþsÿÿÿ~sÿÿÿÿÿÿïÿsÿÿÿÿÿÿß)iÿÿÿr0)i€ÿÿiä)i€iäÿÿzd?„Z$d@dA„Z%dBdC„Z&e'dDdE„ƒZ(dFdG„Z)dHdI„Z*‡Z+S)Mr%rN)rrrrTFc sætƒ ¡d|_||_||_||_||_||_||_||_ d|_ | |_ d|_ d|_ d|_d|_d|_d|_d|_d|_ttjƒ|_t|ƒ}t|tƒr¤t|ƒ\}} } } t|ttfƒrÀt|dƒ|_n| ¡} t | ƒ|_| !¡| "¡dS)Nr+FrÚrb)#Úsuperr­Ú col_sizesÚ_convert_datesÚ_convert_categoricalsÚ _index_colÚ_convert_missingÚ_preserve_dtypesÚ_columnsÚ_order_categoricalsr¯Ú _chunksizeZ_has_string_dataZ_missing_valuesÚ_can_read_value_labelsÚ_column_selector_setÚ_value_labels_readÚ _data_readÚ_dtypeÚ _lines_readÚ_set_endiannessÚsysr½Ú_native_byteorderrr¥r¦rÚbytesÚopenÚ path_or_bufr&rÚ _read_headerÚ _setup_dtype)rªr,rrrr r!r"r#rr$Ú_Z should_closeÚcontents)rÓr+r,r­s:    zStataReader.__init__cCs|S)z enter context manager r+)rªr+r+r,Ú __enter__NszStataReader.__enter__cCs | ¡dS)z exit context manager N)r')rªÚexc_typeÚ exc_valueÚ tracebackr+r+r,Ú__exit__RszStataReader.__exit__cCs(y|j ¡Wntk r"YnXdS)z close the handle if its open N)r,r'ÚIOError)rªr+r+r,r'VszStataReader.closecCs|jdkrd|_nd|_dS)zC Set string encoding which depends on file version évzlatin-1zutf-8N)Úformat_versionr¯)rªr+r+r,Ú _set_encoding]s zStataReader._set_encodingcsjˆj d¡}t d|¡ddkr,ˆ |¡n ˆ |¡tdd„ˆjDƒƒdkˆ_‡fdd„ˆjDƒˆ_ dS)Nr.r·ró.csg|]}ˆ |¡‘qSr+)Ú _calcsize)r3Útyp)rªr+r,r6ps) r,r&r{r|Ú_read_new_headerÚ_read_old_headerržÚtyplistZhas_string_datar)rªÚ first_charr+)rªr,r-fs    zStataReader._read_headercCsä|j d¡t|j d¡ƒ|_|jdkr0ttƒ‚| ¡|j d¡|j d¡dkrXdpZd|_|j d¡t  |jd |j d ¡¡d |_ |j d ¡|  ¡|_ |j d ¡|  ¡|_|j d¡| ¡|_|j d¡|j d¡|j d¡t  |jd|j d¡¡d d|_t  |jd|j d¡¡d d|_t  |jd|j d¡¡d d|_t  |jd|j d¡¡d d|_t  |jd|j d¡¡d d|_| ¡|_|j d¡t  |jd|j d¡¡d d|_t  |jd|j d¡¡d d |_t  |jd|j d¡¡d d|_| |j¡\|_|_|j |j¡| ¡|_ |j |j¡t  |jd|j d|j d |j d¡¡dd…|_!|j |j¡| "¡|_#|j |j¡| $¡|_%|j |j¡| &¡|_'dS)NrÇr[)éur7ésMSFú>ú<éÚHr]rrVé éér—rÉéé é r^érêr.éÿÿÿÿ)(r,r&rCr8rIÚ_version_errorr9r½r{r|ÚnvarÚ _get_nobsÚnobsÚ_get_data_labelÚ _data_labelÚ_get_time_stampÚ time_stampZ_seek_vartypesZ_seek_varnamesZ_seek_sortlistZ _seek_formatsÚ_seek_value_label_namesÚ_get_seek_variable_labelsZ_seek_variable_labelsÚ data_locationÚ seek_strlsÚseek_value_labelsÚ _get_dtypesr@Údtyplistr¼Ú _get_varlistÚvarlistÚsrtlistÚ _get_fmtlistÚfmtlistÚ _get_lbllistÚlbllistÚ_get_variable_labelsÚ_variable_labels)rªrAr+r+r,r>rsT              $$$$$  $$$    zStataReader._read_new_headercshˆj |¡‡fdd„tˆjƒDƒ}‡fdd„‰‡fdd„|Dƒ}‡fdd„‰‡fdd„|Dƒ}||fS)Ncs*g|]"}t ˆjdˆj d¡¡d‘qS)rGr]r)r{r|r½r,r&)r3r´)rªr+r,r6¿sz+StataReader._get_dtypes..cs>|dkr |Sy ˆj|Stk r8td |¡ƒ‚YnXdS)Niýz cannot convert stata types [{0}])rÚKeyErrorrIr1)r=)rªr+r,rvÃs  z"StataReader._get_dtypes..fcsg|] }ˆ|ƒ‘qSr+r+)r3rq)rvr+r,r6ËscsB|dkrt|ƒSy ˆj|Stk r<td |¡ƒ‚YnXdS)Niýz cannot convert stata dtype [{0}])r¦rrhrIr1)r=)rªr+r,rvÍs  csg|] }ˆ|ƒ‘qSr+r+)r3rq)rvr+r,r6Õs)r,r¼r»rQ)rªZ seek_vartypesZ raw_typlistr@r^r+)rvrªr,r]»s    zStataReader._get_dtypescs8ˆjdkrd‰nˆjdkrd‰‡‡fdd„tˆjƒDƒS)NrBr¶r7écsg|]}ˆ ˆj ˆ¡¡‘qSr+)Ú_decoder,r&)r3r´)rÚrªr+r,r6ßsz,StataReader._get_varlist..)r8r»rQ)rªr+)rÚrªr,r_Ùs   zStataReader._get_varlistcsNˆjdkrd‰n$ˆjdkr d‰nˆjdkr0d‰nd‰‡‡fdd „tˆjƒDƒS) Nr7é9éqé1éhrXrVcsg|]}ˆ ˆj ˆ¡¡‘qSr+)rjr,r&)r3r´)rÚrªr+r,r6ìsz,StataReader._get_fmtlist..)r8r»rQ)rªr+)rÚrªr,rbâs   zStataReader._get_fmtlistcs>ˆjdkrd‰nˆjdkr d‰nd‰‡‡fdd„tˆjƒDƒS)Nr7rirír¶rMcsg|]}ˆ ˆj ˆ¡¡‘qSr+)rjr,r&)r3r´)rÚrªr+r,r6ösz,StataReader._get_lbllist..)r8r»rQ)rªr+)rÚrªr,rdïs   zStataReader._get_lbllistcsdˆjdkr$‡fdd„tˆjƒDƒ}n<ˆjdkrH‡fdd„tˆjƒDƒ}n‡fdd„tˆjƒDƒ}|S)Nr7csg|]}ˆ ˆj d¡¡‘qS)iA)rjr,r&)r3r´)rªr+r,r6ûsz4StataReader._get_variable_labels..rìcsg|]}ˆ ˆj d¡¡‘qS)éQ)rjr,r&)r3r´)rªr+r,r6ÿscsg|]}ˆ ˆj d¡¡‘qS)rµ)rjr,r&)r3r´)rªr+r,r6s)r8r»rQ)rªZvlblistr+)rªr,rføs   z StataReader._get_variable_labelscCsJ|jdkr(t |jd|j d¡¡dSt |jd|j d¡¡dSdS)Nr7rèr—rÚIrZ)r8r{r|r½r,r&)rªr+r+r,rRs zStataReader._get_nobscCs |jdkr:t |jd|j d¡¡d}| |j |¡¡S|jdkrnt d|j d¡¡d}| |j |¡¡S|jdkrŠ| |j d ¡¡S| |j d ¡¡SdS) Nr7rGr]rrBrÚr.rìrorµ)r8r{r|r½r,r&rj)rªÚstrlenr+r+r,rT s   zStataReader._get_data_labelcCsŽ|jdkr4t d|j d¡¡d}|j |¡ d¡S|jdkrht d|j d¡¡d}| |j |¡¡S|jdkr„| |j d¡¡Stƒ‚dS) Nr7rÚr.rzutf-8rBrné)r8r{r|r,r&ÚdecoderjrI)rªrqr+r+r,rVs   zStataReader._get_time_stampcCsd|jdkr.|j d¡|jd|jddS|jdkrZt |jd|j d¡¡ddStƒ‚dS) NrBr—r¶éér7rÉr) r8r,r&rXrQr{r|r½rI)rªr+r+r,rY%s    "z%StataReader._get_seek_variable_labelsc st d|¡dˆ_ˆjdkr$ttƒ‚ˆ ¡t dˆj d¡¡ddkrLdpNdˆ_t dˆj d¡¡dˆ_ ˆj d¡t ˆjdˆj d¡¡dˆ_ ˆ  ¡ˆ_ ˆ  ¡ˆ_ˆ ¡ˆ_ˆjd krÚ‡fd d „tˆj ƒDƒ}n^ˆj ˆj ¡}tj|tjd }g}x:|D]2}|ˆjkr$| ˆj|¡n| |d ¡qWy‡fdd „|Dƒˆ_Wn4tk r„td d dd„|Dƒ¡¡ƒ‚YnXy‡fdd „|Dƒˆ_Wn4tk rÒtd d dd„|Dƒ¡¡ƒ‚YnXˆjd krü‡fdd „tˆj ƒDƒˆ_n‡fdd „tˆj ƒDƒˆ_t ˆjdˆj dˆj dˆj d¡¡dd…ˆ_ˆ ¡ˆ_ˆ ¡ˆ_ ˆ !¡ˆ_"ˆjdkrx†t ˆjdˆj d¡¡d}ˆjd krÄt ˆjdˆj d¡¡d}nt ˆjdˆj d¡¡d}|dkrîPˆj |¡qzWˆj #¡ˆ_$dS)NrÚr)rnrìríéorlérésr.rDrErGr]rícsg|]}tˆj d¡ƒ‘qS)r.)Úordr,r&)r3r´)rªr+r,r6Esz0StataReader._read_old_header..)rRécsg|]}ˆj|‘qSr+)r)r3r=)rªr+r,r6Qsz cannot convert stata types [{0}]ú,css|]}t|ƒVqdS)N)r¦)r3rqr+r+r,ú Usz/StataReader._read_old_header..csg|]}ˆj|‘qSr+)r)r3r=)rªr+r,r6Ysz!cannot convert stata dtypes [{0}]css|]}t|ƒVqdS)N)r¦)r3rqr+r+r,r|]scsg|]}ˆ ˆj d¡¡‘qS)r¶)rjr,r&)r3r´)rªr+r,r6cscsg|]}ˆ ˆj d¡¡‘qS)rM)rjr,r&)r3r´)rªr+r,r6gsrêrOrnr´rZ)%r{r|r8rIrPr9r,r&r½ÚfiletyperQrRrSrTrUrVrWr»raÚ frombufferrŒrr¨r@r1Újoinr^r`rarbrcrdrerfrgÚtellrZ)rªrAr@ÚbufZtyplistbÚtpZ data_typeÚdata_lenr+)rªr,r?1st &                zStataReader._read_old_headercCsŽ|jdk r|jSg}xbt|jƒD]T\}}||jkrV| dt|ƒ|j|j|f¡q | dt|ƒdt|ƒf¡q Wt |¡}||_|jS)z"Map between numpy and state dtypesNr°ÚS) r%Ú enumerater@rr¨r¦r½rarR)rªrRr´r=r+r+r,r.s  $" zStataReader._setup_dtypecCs t|ƒtkr|pt |j|¡S)N)r;rCr{Úcalcsizer½)rªr`r+r+r,r<szStataReader._calcsizecCsT| d¡d}y | |j¡Stk rNd}t |j|jdt¡| d¡SXdS)Nr³ra One or more strings in the dta file could not be decoded using {encoding}, and so the fallback encoding of latin-1 is being used. This can happen when a file has been incorrectly encoded by Stata or some other software. You should verify the string values returned are correct.)rzlatin-1)Ú partitionrsr¯ÚUnicodeDecodeErrorrhrir1ÚUnicodeWarning)rªr°Úmsgr+r+r,rj s zStataReader._decodec Cs|jr dS|jdkr&d|_tƒ|_dS|jdkr@|j |j¡n |j|jj }|j |j |¡d|_tƒ|_xŒ|jdkrŽ|j  d¡dkrŽP|j  d¡}|s P|jdkr¾|  |j  d¡¡}n|  |j  d¡¡}|j  d ¡t  |jd |j  d¡¡d }t  |jd |j  d¡¡d }tj|j  d|¡|jd |d }tj|j  d|¡|jd |d }t |¡}||}||}|j  |¡} tƒ|j|<xTt|ƒD]H} | |dkr¶|| dn|} |  | || | …¡|j||| <q˜W|jdkrr|j  d¡qrWd|_dS)NríTrBés|¡}|s–g}d}xn|D]f}||j1} | tj?tj@fkrDtjA} d}n | tjBtjCtjDfkrdtjE} d}| 3||| F| ¡f¡qW|r–t 5t6|ƒ¡}|dk r°| $| G|¡¡}|S)NrT)r"rB)rRrŒ)Z convert_dtypeF)rqÚreturncst‡fdd„tDƒƒS)Nc3s|]}ˆ |¡VqdS)N)rf)r3r`)rqr+r,r|žsz;StataReader.read..any_startswith..)rcÚ _date_formats)rqr+)rqr,Úany_startswithsz(StataReader.read..any_startswithcsg|] }ˆ|ƒ‘qSr+r+)r3rq)r r+r,r6 sz$StataReader.read..rí)HrSr!r$r'rr`rrrrrrrr8r#r—r%r&rŽr8rÚ StopIterationr,r¼rZrar~r&r½r)ZbyteswapÚ newbyteorderržZ from_recordsr"rZ set_indexÚ_do_select_columnsrIr:r@r;rCrurjÚ _insert_strlsÚwherer^rrRrjr¨rÚ from_dictrÚ_do_convert_missingr¦rŠrcrmÚ_do_convert_categoricalsrreZfloat16rrzr‹rrŽrerdÚpop)rªršrrrr r!r"r#rRZ max_read_lenÚread_lenrÀÚ read_linesr*Úixr‡r=Zcols_Zrequires_type_conversionÚdata_formattedr´ÚcolsZ retyped_dataÚconvertr+)r r,r&sÞ                     zStataReader.readcCsBi}xt|ƒD]ö\}}|j|}||jkr.q|j|\}}||} t | |k| |k¡} |  ¡sbq|rÌt | j¡} tj| | dd\} } t | tj d}xft| ƒD]&\}}t |ƒ}| | |k}||j |<q Wn2| j }|tjtjfkrètj}t | |d}tj|| <|||<qW|r>|j}t|ƒ}t| |jd¡|gdƒ}||}|S)NT)Zreturn_inverse)rRr.)r…r@rraÚ logical_orrcZargwhereZ_ndarray_valuesÚuniquerrjrÄÚilocrRrrzÚnanr"rrZdrop)rªr*r Ú replacementsr´Zcolnamer`ZnminZnmaxZseriesÚmissingZ missing_locZumissingZ umissing_locÚ replacementÚjZumr}ÚlocrRr"r+r+r,r§Äs<       zStataReader._do_convert_missingcsptˆdƒrtˆjƒdkr|SxNtˆjƒD]@\}}|dkr:q(‡fdd„|jdd…|fDƒ|jdd…|f<q(W|S)Nr”rrècsg|]}ˆjt|ƒ‘qSr+)r”r¦)r3Úk)rªr+r,r6òsz-StataReader._insert_strls..)Úhasattrržr”r…r@r²)rªr*r´r=r+)rªr,r¤ës2zStataReader._insert_strlsc CsÜ|jsÔt|ƒ}t|ƒt|ƒkr&tdƒ‚| |j¡}|rLtdd t|ƒ¡ƒ‚g}g}g}g}xX|D]P} |j | ¡} |  |j | ¡|  |j | ¡|  |j | ¡|  |j | ¡qbW||_ ||_ ||_ ||_ d|_||S)Nz"columns contains duplicate entrieszr]r_rbrdrfrRrTrVrYr?r.r<rjrr—rÚ_data_method_docr*r›rÚ_read_method_docr&r§r¤r£r¨rÞrÃrÄrŸÚ __classcell__r+r+)rÓr,r%sl  *  I      \5   '  -  r%cCs t|dƒr|dfSt|dƒdfS)a  Open a binary file or no-op if file-like. Parameters ---------- fname : string path, path object or buffer Returns ------- file : file-like object File object supporting write own : bool True if the file was created, otherwise False r¸FÚwbT)rºr+)Úfnamer+r+r,Ú_open_file_binary_writecs rËcCs4| ¡dkrdS| ¡dkr dStdj|dƒ‚dS)N)rEÚlittlerE)rDÚbigrDz"Endianness {endian} not understood)Úendian)ÚlowerrIr1)Ú endiannessr+r+r,r'xs   r'cCs|d|t|ƒS)zQ Take a char string and pads it with null bytes until it's length chars. r²)rž)r˜r•r+r+r,rºsrºcCs"|dkrtjStdj|dƒ‚dS)zK Convert from one of the stata date formats to a type in TYPE_MAP. )rPz%tcrSz%tdrTz%twrWz%tmrYz%tqr\z%thr_z%tyzFormat {fmt} not implemented)r`N)rarzÚNotImplementedErrorr1)r`r+r+r,Ú_convert_datetime_to_stata_typeˆsrÒcCszi}xp|D]h}|| d¡s,d||||<||krN| | |¡||i¡q t|tƒs`tdƒ‚| |||i¡q W|S)Nú%z0convert_dates key must be a column or an integer)rfÚupdaterr¥rCrI)rr`Znew_dictr–r+r+r,Ú_maybe_convert_to_int_keys¡s  rÕcCs~|jtjkr$tt|jƒƒ}t|dƒS|tjkr2dS|tjkr@dS|tj krNdS|tj kr\dS|tj krjdSt dj |dƒ‚d S) aõ Convert dtype types to stata types. Returns the byte of the given ordinal. See TYPE_MAP and comments for an explanation. This is also explained in the dta spec. 1 - 244 are strings of this length Pandas Stata 251 - for int8 byte 252 - for int16 int 253 - for int32 long 254 - for float32 float 255 - for double double If there are dates to convert, then dtype will already have the correct type inserted. r.rçrærårärãz Data type {dtype} not supported.)rRN)r;raÚobject_rr rJr7rzrrŽrr‹rÑr1)rRÚcolumnrŽr+r+r,Ú_dtype_to_stata_type¯s       rØrwcCsô|dkrd}n d}|rdS|jtjkržt|dd}|dksXt|ƒdksXtd j|jd ƒ‚tt |j ƒƒ}||krˆ|dkrzdStt |jƒ‚d t t |d ƒƒd S|tjkr¬dS|tjkrºdS|tjkrÈdS|tjksÜ|tjkràdStdj|dƒ‚dS)a¢ Map numpy dtype to stata's default format for this type. Not terribly important since users can change this in Stata. Semantics are object -> "%DDs" where DD is the length of the string. If not a string, raise ValueError float64 -> "%10.0g" float32 -> "%9.0g" int64 -> "%9.0g" int32 -> "%12.0g" int16 -> "%8.0g" int8 -> "%8.0g" strl -> "%9s" rBéôiýz%9sT)ro)rÏÚunicodera!Column `{col}` cannot be exported. Only string-like object arrays containing all strings or a mix of strings and None can be exported. Object arrays containing only null values are prohibited. Other object typescannot be exported and must first be converted to one of the supported types.)r‡rÓr.r°z%10.0gz%9.0gz%12.0gz%8.0gz Data type {dtype} not supported.)rRN)r;rarÖrržrIr1r˜rr rJÚexcessive_string_length_errorr¦r7rzrrŽr‹rrÑ)rRr×Ú dta_versionÚ force_strlÚ max_str_lenZinferred_dtyperŽr+r+r,Ú_dtype_to_default_stata_fmtÕs6      rßcseZdZdZdZedddd?‡fdd „ ƒZd d „Zd d „Zdd„Z dd„Z dd„Z dd„Z dd„Z dd„Zdd„Zdd„Zdd„Zd d!„Zd"d#„Zd$d%„Zd&d'„Zd@d(d)„Zd*d+„Zd,d-„Zd.d/„Zd0d1„Zd2d3„Zd4d5„Zd6d7„Zd8d9„Zd:d;„ZdAd=d>„Z ‡Z!S)BÚ StataWriteraÒ A class for writing Stata binary dta files Parameters ---------- fname : path (string), buffer or path object string, path object (pathlib.Path or py._path.local.LocalPath) or object implementing a binary write() functions. If using a buffer then the buffer will not be automatically closed after the file is written. .. versionadded:: 0.23.0 support for pathlib, py.path. data : DataFrame Input to save convert_dates : dict Dictionary mapping columns containing datetime types to stata internal format to use when writing the dates. Options are 'tc', 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer or a name. Datetime columns that do not have a conversion type specified will be converted to 'tc'. Raises NotImplementedError if a datetime column has timezone information write_index : bool Write the index to Stata dataset. encoding : str Default is latin-1. Only latin-1 and ascii are supported. byteorder : str Can be ">", "<", "little", or "big". default is `sys.byteorder` time_stamp : datetime A datetime to use as file creation date. Default is the current time data_label : str A label for the data set. Must be 80 characters or smaller. variable_labels : dict Dictionary containing columns as keys and variable labels as values. Each label must be 80 characters or smaller. .. versionadded:: 0.19.0 Returns ------- writer : StataWriter instance The StataWriter instance has a write_file method, which will write the file to the given `fname`. Raises ------ NotImplementedError * If datetimes contain timezone information ValueError * Columns listed in convert_dates are neither datetime64[ns] or datetime.datetime * Column dtype is not representable in Stata * Column listed in convert_dates is not in DataFrame * Categorical label contains more than 32,000 characters Examples -------- >>> data = pd.DataFrame([[1.0, 1]], columns=['a', 'b']) >>> writer = StataWriter('./data_file.dta', data) >>> writer.write_file() Or with dates >>> from datetime import datetime >>> data = pd.DataFrame([[datetime(2000,1,1)]], columns=['date']) >>> writer = StataWriter('./date_data_file.dta', data, {'date' : 'tw'}) >>> writer.write_file() rÙrN)rrTúlatin-1c sŠtƒ ¡|dkrin||_||_d|_||_||_| |_d|_|  |¡|dkrXt j }t |ƒ|_ t|ƒ|_tjtjtjdœ|_i|_dS)Nzlatin-1T)rårärã)rr­rÚ _write_indexr¯Ú _time_stamprUrgÚ _own_fileÚ_prepare_pandasr(r½r'Ú _byteorderrÚ_fnamerarŽrr‹Ztype_convertersÚ_converted_names) rªrÊr*rÚ write_indexrr½rWrÃrÄ)rÓr+r,r­Us    zStataWriter.__init__cCs|j | |jp|j¡¡dS)zS Helper to call encode before writing to file for Python 3 compat. N)Ú_filer¸r®r¯Z_default_encoding)rªZto_writer+r+r,Ú_writetszStataWriter._writec s‡fdd„ˆDƒ}||_g|_t|ƒs*ˆStj}g}xÚtˆ|ƒD]Ì\}}|rú|j tˆ|ƒ¡ˆ|jj j }|t j kr€t dƒ‚ˆ|jj j ¡}| ¡||ƒkrÚ|t jkr´t j}n|t jkrÆt j}nt j}t j||d}||ƒ||dk<| ||f¡q@| |ˆ|f¡q@Wt t|ƒ¡S)zxCheck for categorical columns, retain categorical information for Stata file and convert categorical data to intcsg|]}tˆ|ƒ‘qSr+)r )r3r‡)r*r+r,r6~sz5StataWriter._prepare_categoricals..zCIt is not possible to export int64-based categorical data to Stata.)rRrO)Ú _is_col_catÚ _value_labelsrcrÄrØr:r¨r•ršÚcodesrRrarerIrJÚcopyr7r‹rrŽrzr©rr¦r) rªr*Zis_catrØr­r‡Z col_is_catrRrJr+)r*r,Ú_prepare_categoricalszs4   z!StataWriter._prepare_categoricalscCs^xX|D]P}||j}|tjtjfkr|tjkr:|jd}n |jd}|| |¡||<qW|S)ztChecks floating point data columns for nans, and replaces these with the generic Stata for missing value (.)rvr@)rRrarrzrÌÚfillna)rªr*r·rRr¶r+r+r,Ú _replace_nans¢s     zStataWriter._replace_nanscCsdS)zNo-op, forward compatibilityNr+)rªr+r+r,Ú_update_strl_names±szStataWriter._update_strl_namesc Cs i}t|jƒ}|dd…}d}x$t|ƒD]\}}|}t|tƒsJt|ƒ}xP|D]H} | dksd| dkrP| dkst| dkrP| dks„| dkrP| d krP| | d ¡}qPW||jkr®d |}|ddkrÎ|ddkrÎd |}|dtt|ƒd ƒ…}||ks:xB|  |¡dkr0d t|ƒ|}|dtt|ƒd ƒ…}|d 7}qðW|||<|||<q*W||_|j rx:t ||ƒD],\} } | | kr`|j | |j | <|j | =q`W|rg} xV|  ¡D]J\}}y|  d ¡}Wnttfk rÔYnXd  ||¡} |  | ¡q¤Wt d | ¡¡} t | t¡||_| ¡|S)aÌ Checks column names to ensure that they are valid Stata column names. This includes checks for: * Non-string names * Stata keywords * Variables that start with numbers * Variables with names that are too long When an illegal variable name is detected, it is converted, and if dates are exported, the variable name is propagated to the date conversion dictionary NrÚAÚZrâÚzr‘Ú9r/rµr.zutf-8z{0} -> {1}z )rœr"r…r¥r¦Úreplacerr8ržrŒrr:Úitemsr®rˆÚAttributeErrorr1r¨Úinvalid_name_docrrhrir„rèró)rªr*Zconverted_namesr"Zoriginal_columnsZduplicate_var_idr·r˜Ú orig_namer·ÚoZconversion_warningrŠr’r+r+r,Ú_check_column_namesµs\            zStataWriter._check_column_namescCsRg|_g|_x@| ¡D]4\}}|j t|||ƒ¡|j t|||ƒ¡qWdS)N)r@rcrùr¨rßrØ)rªr*Údtypesr‡rRr+r+r,Ú_set_formats_and_types s z"StataWriter._set_formats_and_typescCs | ¡}|jr| ¡}| |¡}t|ƒ}| |¡}| |¡}|j\|_|_ ||_ |j   ¡|_ |j}x.|D]&}||jkrxqht||ƒrhd|j|<qhWt|j|j ƒ|_x*|jD] }t|j|ƒ}t |¡||<qªW| ||¡|jdk rx|jD]}|j||j|<qîWdS)NrP)rïrâZ reset_indexrþr”ròrðÚshaperSrQr*r"Útolistr`rÿrr rÕrÒrarRrrc)rªr*rÿr‡r–Únew_typer+r+r,rå s2           zStataWriter._prepare_pandasc Cst|jƒ\|_|_yŽ|j|j|jd| ¡| ¡|  ¡|  ¡|  ¡|  ¡|  ¡| ¡| ¡| ¡| ¡| ¡| ¡| ¡| ¡Wnptk r}zP| ¡y|jrÐt |j¡Wn(tk rút d |j¡t¡YnX|‚Wdd}~XYn X| ¡dS)N)rWrÃzSThis save was not successful but {0} could not be deleted. This file is not valid.)rËrçrêräÚ _write_headerrãrUÚ _write_mapÚ_write_variable_typesÚ_write_varnamesÚ_write_sortlistÚ_write_formatsÚ_write_value_label_namesÚ_write_variable_labelsÚ_write_expansion_fieldsÚ_write_characteristicsÚ _prepare_dataÚ _write_dataÚ _write_strlsÚ_write_value_labelsÚ_write_file_close_tagr˜Ú_closeÚosÚunlinkrhrir1ÚResourceWarning)rªÚexcr+r+r,Ú write_file> s<  zStataWriter.write_filecCs8y|j ¡Wntk r"YnX|jr4|j ¡dS)aA Close the file if it was created by the writer. If a buffer or file-like object was passed in, for example a GzipFile, then leave this file open for the caller to close. In either case, attempt to flush the file contents to ensure they are written to disk (if supported) N)rêÚflushrúrär')rªr+r+r,ra s zStataWriter._closecCsdS)zNo-op, future compatibilityNr+)rªr+r+r,rr szStataWriter._write_mapcCsdS)zNo-op, future compatibilityNr+)rªr+r+r,rv sz!StataWriter._write_file_close_tagcCsdS)zNo-op, future compatibilityNr+)rªr+r+r,r z sz"StataWriter._write_characteristicscCsdS)zNo-op, future compatibilityNr+)rªr+r+r,r~ szStataWriter._write_strlscCs| tddƒ¡dS)z"Write 5 zeros for expansion fieldsr…r‹N)rërº)rªr+r+r,r ‚ sz#StataWriter._write_expansion_fieldscCs,x&|jD]}|j | |j|j¡¡qWdS)N)rírêr¸rÂrær¯)rªr«r+r+r,r† s zStataWriter._write_value_labelsc CsT|j}|j t dd¡¡| |dkr*dp,d¡| d¡| d¡|j t |d|j¡dd…¡|j t |d |j¡dd …¡|dkrª|j | t d d ƒ¡¡n |j | t |dd …d ƒ¡¡|dkrÞt j   ¡}nt |t j ƒsòt d ƒ‚ddddddddddddg }dd„t|ƒDƒ}| d¡||j| d¡}|j | |¡¡dS)NrÚrwrDúúr²rêr]r´rZr…éPz"time_stamp should be datetime typeÚJanÚFebÚMarÚAprÚMayÚJunÚJulÚAugÚSepÚOctÚNovÚDeccSsi|]\}}||d“qS)r.r+)r3r´r<r+r+r,ú ³ sz-StataWriter._write_header..z%d z %Y %H:%M)rærêr¸r{r¹rërQrSÚ_null_terminaterºr2Únowr¥rIr…Ústrftimer<)rªrÃrWr½ÚmonthsÚ month_lookupÚtsr+r+r,rŠ s:  ""   zStataWriter._write_headercCs(x"|jD]}|j t d|¡¡qWdS)Nr’)r@rêr¸r{r¹)rªr=r+r+r,r» s z!StataWriter._write_variable_typescCs<x6|jD],}| |d¡}t|dd…dƒ}| |¡qWdS)NTrµr¶)r`r*rºrë)rªr˜r+r+r,r¿ s  zStataWriter._write_varnamescCs"tdd|jdƒ}| |¡dS)Nr…r]r.)rºrQrë)rªrar+r+r,rÇ szStataWriter._write_sortlistcCs$x|jD]}| t|dƒ¡qWdS)Nrm)rcrërº)rªr`r+r+r,r Ì s zStataWriter._write_formatscCsfx`t|jƒD]R}|j|rN|j|}| |d¡}t|dd…dƒ}| |¡q | tddƒ¡q WdS)NTrµr¶r…)r»rQrìr`r*rºrë)rªr´r˜r+r+r,r Ñ s    z$StataWriter._write_value_label_namescCs¬tddƒ}|jdkr6xt|jƒD]}| |¡q WdSxp|jD]f}||jkrš|j|}t|ƒdkrjtdƒ‚tdd„|Dƒƒ}|sˆtdƒ‚| t|dƒ¡q>| |¡q>WdS)Nr…rorz.Variable labels must be 80 characters or fewercss|]}t|ƒdkVqdS)éN)ry)r3r·r+r+r,r|í sz5StataWriter._write_variable_labels..zKVariable labels must contain only characters that can be encoded in Latin-1) rºrgr»rQrër*ržrIÚall)rªÚblankr´r‡rÂÚ is_latin1r+r+r,r Ý s"      z"StataWriter._write_variable_labelscCs|S)zNo-op, future compatibilityr+)rªr*r+r+r,Ú_convert_strlsø szStataWriter._convert_strlsc Cs|j}|j}|j}|jdk rRx4t|ƒD](\}}||kr&t|||j|ƒ||<q&W| |¡}i}|jtt j ƒk}x˜t|ƒD]Œ\}}||}||j krä||  d¡j t|fd||<dj|d} | ||<||j |j¡ | ¡||<qz||j} |sþ|  |j¡} | ||<qzW|jd|d|_dS)Nr…)ÚargszS{type})r;F)rZ column_dtypes)r*r@rr…r~rcr4rær'r(r½Ú_max_string_lengthrñrurºr1r¦r®r¯rdrRr¢Z to_records) rªr*r@rr´r‡rÿZnative_byteorderr=ÚstyperRr+r+r,rü s.       zStataWriter._prepare_datacCs|j}|j | ¡¡dS)N)r*rêr¸Útobytes)rªr*r+r+r,r szStataWriter._write_dataFcCs d}||7}|s| |j¡}|S)Nr²)r®r¯)rªr°Ú as_stringr¿r+r+r,r* s  zStataWriter._null_terminate)NTráNNNN)NN)F)"r€rr‚rÃr6rr­rërðròrórþrrårrrrr rr rrrrrr r r r4rrr*rÈr+r+)rÓr,ràsHC (Q1# 1  ràcCs’|rdS|jtjkr", "<", "little", or "big". default is `sys.byteorder` Notes ----- Supports creation of the StrL block of a dta file for dta versions 117, 118 and 119. These differ in how the GSO is stored. 118 and 119 store the GSO lookup value as a uint32 and a uint64, while 117 uses two uint32s. 118 and 119 also encode all strings as unicode which is required by the format. 117 uses 'latin-1' a fixed width encoding that extends the 7-bit ascii table with an additional 128 characters. rBNcCsž|dkrtdƒ‚||_||_||_tdƒ|_|dkr:tj}t|ƒ|_ d}d}d|_ |dkrjd}d}d |_ n|d krxd }nd }d dd||_ ||_ ||_ dS)N)rBr7éwz,Only dta versions 117, 118 and 119 supported))r…)rrrprèzutf-8rBrZzlatin-1r7r^r‹r]r—)rIZ_dta_verÚdfr"rÚ _gso_tabler(r½r'rær¯Ú_o_offetÚ _gso_o_typeÚ _gso_v_type)rªr>r"Úversionr½Z gso_v_typeZ gso_o_typeZo_sizer+r+r,r­z s,  zStataStrLWriter.__init__cCs|\}}||j|S)N)r@)rªr–rwrýr+r+r,Ú _convert_key• szStataStrLWriter._convert_keycs|j}|j}t|jƒ‰||j}‡fdd„|jDƒ}tj|jtjd}xŒt|  ¡ƒD]|\}\}}xnt|ƒD]b\} \} } || } | dkrŠdn| } |  | d¡} | dkrº| d|df} | || <|  | ¡||| f<qjWqTWx*t|jƒD]\}} |dd…|f|| <qàW||fS)aÿ Generates the GSO lookup table for the DataFRame Returns ------- gso_table : OrderedDict Ordered dictionary using the string found as keys and their lookup position (v,o) as values gso_df : DataFrame DataFrame where strl columns have been converted to (v,o) values Notes ----- Modifies the DataFrame in-place. The DataFrame returned encodes the (v,o) values as uint64s. The encoding depends on teh dta version, and can be expressed as enc = v + o * 2 ** (o_size * 8) so that v is stored in the lower bits and o is in the upper bits. o_size is * 117: 4 * 118: 6 * 119: 5 csg|]}|ˆ |¡f‘qSr+)r)r3r‡)r"r+r,r6» sz2StataStrLWriter.generate_table..)rRNr…r.) r?r>rœr"raÚemptyrr†r…ZiterrowsÚgetrD)rªÚ gso_tableZgso_dfÚselectedZ col_indexrÁrýÚidxÚrowr·r‡rwrˆr–r´r+)r"r,Úgenerate_table™ s$   zStataStrLWriter.generate_tablecCs | |j¡S)z- Python 3 compatibility shim )r®r¯)rªr°r+r+r,r±Í szStataStrLWriter._encodecCstƒ}tddƒ}t |jdd¡}t |jdd¡}|j|j}|j|j}|jd}x–| ¡D]Š\} } | dkrrq`| \} } | |¡| t || ¡¡| t || ¡¡| |¡t| dƒ} | t |t | ƒd ¡¡| | ¡| |¡q`W|  d¡|  ¡S) aî Generates the binary blob of GSOs that is written to the dta file. Parameters ---------- gso_table : OrderedDict Ordered dictionary (str, vo) Returns ------- gso : bytes Binary content of dta file to be placed between strl tags Notes ----- Output format depends on dta version. 117 uses two uint32s to express v and o while 118+ uses a uint32 for v and a uint64 for o. r”Úasciir’r“rrp)rrzutf-8r.) rr*r{r¹rærBrArùr¸ržr¼r&)rªrGr¾ZgsoZgso_typeÚnullZv_typeZo_typeZlen_typeZstrlZvorwrýZ utf8_stringr+r+r,Ú generate_blobÓ s*         zStataStrLWriter.generate_blob)rBN) r€rr‚rÃr­rDrKr±rNr+r+r+r,r<] s  4r<c sÐeZdZdZdZedddd0‡fdd „ ƒZed d „ƒZd d „Z d1dd„Z dd„Z dd„Z dd„Z dd„Zdd„Zdd„Zdd„Zdd„Zd d!„Zd"d#„Zd$d%„Zd&d'„Zd(d)„Zd*d+„Zd,d-„Zd.d/„Z‡ZS)2ÚStataWriter117aË A class for writing Stata binary dta files in Stata 13 format (117) .. versionadded:: 0.23.0 Parameters ---------- fname : path (string), buffer or path object string, path object (pathlib.Path or py._path.local.LocalPath) or object implementing a binary write() functions. If using a buffer then the buffer will not be automatically closed after the file is written. data : DataFrame Input to save convert_dates : dict Dictionary mapping columns containing datetime types to stata internal format to use when writing the dates. Options are 'tc', 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer or a name. Datetime columns that do not have a conversion type specified will be converted to 'tc'. Raises NotImplementedError if a datetime column has timezone information write_index : bool Write the index to Stata dataset. encoding : str Default is latin-1. Only latin-1 and ascii are supported. byteorder : str Can be ">", "<", "little", or "big". default is `sys.byteorder` time_stamp : datetime A datetime to use as file creation date. Default is the current time data_label : str A label for the data set. Must be 80 characters or smaller. variable_labels : dict Dictionary containing columns as keys and variable labels as values. Each label must be 80 characters or smaller. convert_strl : list List of columns names to convert to Stata StrL format. Columns with more than 2045 characters are automatically written as StrL. Smaller columns can be converted by including the column name. Using StrLs can reduce output file size when strings are longer than 8 characters, and either frequently repeated or sparse. Returns ------- writer : StataWriter117 instance The StataWriter117 instance has a write_file method, which will write the file to the given `fname`. Raises ------ NotImplementedError * If datetimes contain timezone information ValueError * Columns listed in convert_dates are neither datetime64[ns] or datetime.datetime * Column dtype is not representable in Stata * Column listed in convert_dates is not in DataFrame * Categorical label contains more than 32,000 characters Examples -------- >>> from pandas.io.stata import StataWriter117 >>> data = pd.DataFrame([[1.0, 1, 'a']], columns=['a', 'b', 'c']) >>> writer = StataWriter117('./data_file.dta', data) >>> writer.write_file() Or with long strings stored in strl format >>> data = pd.DataFrame([['A relatively long string'], [''], ['']], ... columns=['strls']) >>> writer = StataWriter117('./data_file_with_long_strings.dta', data, ... convert_strl=['strls']) >>> writer.write_file() iýrN)rrTúlatin-1c sF| dkr gn | dd…|_tƒj|||||||| dd|_d|_dS)N)r½rWrÃrÄ)Ú _convert_strlrr­Ú_mapÚ _strl_blob) rªrÊr*rrérr½rWrÃrÄZ convert_strl)rÓr+r,r­a szStataWriter117.__init__cCs<t|tƒrt|dƒ}td|ddƒ|td|ddƒS)zSurround val with zutf-8rErDzzutf-8Z117ÚreleaserDZMSFZLSFr½irGÚKrpÚNNrr…r’rÂz"time_stamp should be datetime typerrrr r!r"r#r$r%r&r'r(cSsi|]\}}||d“qS)r.r+)r3r´r<r+r+r,r)² sz0StataWriter117._write_header..z%d z %Y %H:%MóÚutf8Ú timestamprÚheader)rærêr¸r*rrUrQÚAssertionErrorr{r¹rSržr2r+r¥rIr…r,r<r¼r&) rªrÃrWr½r¾rÂZ label_lenr-r.r/r+r+r,rŠ sD     zStataWriter117._write_headercCs¤|jdkr:tdd|j ¡fdddddd d d d d ddfƒ|_|j |jd¡tƒ}x*|j ¡D]}| t  |j d|¡¡q^W| d¡|j |  |  ¡d¡¡dS)zµCalled twice during file write. The first populates the values in the map with 0s. The second call writes the final map locations when all blocks have been written.N)Z stata_datarÚmap)Úvariable_typesr)Úvarnamesr)Úsortlistr)Úformatsr)Úvalue_label_namesr)rÄr)Úcharacteristicsr)r*r)Ústrlsr)rŸr)Ústata_data_closer)z end-of-filerrèr) rRrrêr€r¼rrJr¸r{r¹rærUr&)rªr¾rˆr+r+r,r¾ s,    zStataWriter117._write_mapcCs^| d¡tƒ}x&|jD]}| t |jd|¡¡qW| d¡|j |  |  ¡d¡¡dS)Nr`rGr) rVrr@r¸r{r¹rær¼rêrUr&)rªr¾r=r+r+r,rÝ s    z$StataWriter117._write_variable_typescCsn| d¡tƒ}x6|jD],}| |d¡}t|dd…dƒ}| |¡qW| d¡|j | |  ¡d¡¡dS)NraTrµr¶r) rVrr`r*r;r¸r¼rêrUr&)rªr¾r˜r+r+r,rå s    zStataWriter117._write_varnamescCs,| d¡|j | d|jdd¡¡dS)Nrbsr.)rVrêr¸rUrQ)rªr+r+r,rï s zStataWriter117._write_sortlistcCsV| d¡tƒ}x|jD]}| t|dƒ¡qW| d¡|j | | ¡d¡¡dS)Nrcrmr) rVrrcr¸r;r¼rêrUr&)rªr¾r`r+r+r,r ó s    zStataWriter117._write_formatscCsŠ| d¡tƒ}xRt|jƒD]D}d}|j|r8|j|}| |d¡}t|dd…dƒ}| |¡qW|  d¡|j  |  |  ¡d¡¡dS)Nrdr…Trµr¶r) rVrr»rQrìr`r*r;r¸r¼rêrUr&)rªr¾r´r˜r+r+r,r û s     z'StataWriter117._write_value_label_namescCs| d¡tƒ}tddƒ}|jdkrhxt|jƒD]}| |¡q0W| d¡|j |  |  ¡d¡¡dSxp|j D]f}||jkrÌ|j|}t |ƒdkrœt dƒ‚tdd„|Dƒƒ}|sºt d ƒ‚| t|dƒ¡qp| |¡qpW| d¡|j |  |  ¡d¡¡dS) NrÄr…rorrz.Variable labels must be 80 characters or fewercss|]}t|ƒdkVqdS)r0N)ry)r3r·r+r+r,r| sz8StataWriter117._write_variable_labels..zKVariable labels must contain only characters that can be encoded in Latin-1)rVrr;rgr»rQr¸r¼rêrUr&r*ržrIr1)rªr¾r2r/r‡rÂr3r+r+r,r  s.         z%StataWriter117._write_variable_labelscCs"| d¡|j | dd¡¡dS)Nrert)rVrêr¸rU)rªr+r+r,r * s z%StataWriter117._write_characteristicscCs<| d¡|j}|j d¡|j | ¡¡|j d¡dS)Nr*ss)rVr*rêr¸r8)rªr*r+r+r,r. s   zStataWriter117._write_datacCs6| d¡d}|jdk r|j}|j | |d¡¡dS)Nrfrt)rVrSrêr¸rU)rªrfr+r+r,r5 s   zStataWriter117._write_strlscCsdS)zNo-op in dta 117+Nr+)rªr+r+r,r < sz&StataWriter117._write_expansion_fieldscCsl| d¡tƒ}x4|jD]*}| |j|j¡}| |d¡}| |¡qW| d¡|j  | |  ¡d¡¡dS)NrŸZlblr) rVrrírÂrær¯rUr¸r¼rêr&)rªr¾r«Zlabr+r+r,r@ s    z"StataWriter117._write_value_labelscCs*| d¡|j tddƒ¡| d¡dS)Nrgz zutf-8z end-of-file)rVrêr¸r*)rªr+r+r,rJ s z$StataWriter117._write_file_close_tagcCs<x6|j ¡D](\}}||jkr |j |¡}||j|<q WdS)ztUpdate column names for conversion to strl if they might have been changed to comply with Stata naming rulesN)rèrùrQr)rªÚorigÚnewrIr+r+r,róO s  z!StataWriter117._update_strl_namescsD‡fdd„t|ƒDƒ}|r@t||ƒ}| ¡\}}|}| |¡ˆ_|S)zUConvert columns to StrLs if either very large or in the convert_strl variablecs,g|]$\}}ˆj|dks$|ˆjkr|‘qS)i€)r@rQ)r3r´r‡)rªr+r,r6\ sz1StataWriter117._convert_strls..)r…r<rKrNrS)rªr*Z convert_colsZsswÚtabZnew_datar+)rªr,r4X s     zStataWriter117._convert_strlscCshg|_g|_xV| ¡D]J\}}||jk}t|||d|d}|j |¡|j t||||ƒ¡qWdS)NrB)rÜrÝ)r@rcrùrQrßr¨r:)rªr*rÿr‡rRrÝr`r+r+r,rh s  z%StataWriter117._set_formats_and_types)NTrPNNNNN)NN)r€rr‚rÃr6rr­Ú staticmethodrUrVrrrrrr r r r rrr rrrór4rrÈr+r+)rÓr,rO s<I   4 !  rO) TTNNFTNTNF)rwF)RrÃÚ collectionsrr2Úiorrr{r(rhZdateutil.relativedeltarZnumpyraZpandas._libs.librZpandas._libs.writersrZpandas.util._decoratorsrrZpandas.core.dtypes.commonr r r Zpandasr r rrrrrrZpandas.core.framerZpandas.core.seriesrZpandas.io.commonrrrrPZ_statafile_processing_params1Z_encoding_paramsZ_statafile_processing_params2Z_chunksize_paramsZ_iterator_paramsZ_read_stata_docrÆrÇrÅr-rŸrgrmr~rÛÚWarningrrrƒr§r„rûr”r•rÄràr%rËr'rºrÒrÕrØrßràr:r;r<rOr+r+r+r,Ú s¼     (   +  &f \}~U & 9 * 8