U Dx`5@sddlZddlZddlZddlZddlmZmZddlmZddl m Z Gddde j eZ ddZ dd Z d d Zd d ZdddZdddZdS)N) implements _DEPR_MSG) FileSystemcseZdZdZdddZdd Zd d Zeej fd d Z eej fddZ eej dfdd Z fddZ eej fddZ eejfddZdfdd ZddZZS) HadoopFileSystemzj FileSystem interface for HDFS cluster. See pyarrow.hdfs.connect for full connection details defaultrNlibhdfscCs>tjtdddtdd|dkr(t||||||dS)Nzhdfs.HadoopFileSystem2.0.0fs.HadoopFileSystem stacklevelr)warningswarnrformatDeprecationWarning_maybe_set_hadoop_classpath_connect)selfhostportuser kerb_ticketZdriver extra_confr3/tmp/pip-target-oguziej0/lib/python/pyarrow/hdfs.py__init__$szHadoopFileSystem.__init__cCst|j|j|j|j|jffSN)rrrrrrrrrr __reduce__/szHadoopFileSystem.__reduce__cCsdS)zR Return True if this is a Unix-style file store with directories. Trrrrr _isfilestore3szHadoopFileSystem._isfilestorecs t|Sr)superisdirrpath __class__rrr!9szHadoopFileSystem.isdircs t|Sr)r isfiler"r$rrr&=szHadoopFileSystem.isfileFcst||Sr)r delete)rr# recursiver$rrr'AszHadoopFileSystem.deletec s t|S)a Create directory in HDFS. Parameters ---------- path : str Directory path to create, including any parent directories. Notes ----- libhdfs does not support create_parents=False, so we ignore this here )r mkdir)rr#kwargsr$rrr)Es zHadoopFileSystem.mkdircst||Sr)r rename)rr#new_pathr$rrr+TszHadoopFileSystem.renamecs t|Sr)r existsr"r$rrr-XszHadoopFileSystem.existscst||S)ap Retrieve directory contents and metadata, if requested. Parameters ---------- path : str HDFS path to retrieve contents of. detail : bool, default False If False, only return list of paths. Returns ------- result : list of dicts (detail=True) or strings (detail=False) )r ls)rr#detailr$rrr.\szHadoopFileSystem.lsccsN|j|dd}t||\}}|||fV|D]}||||EdHq,dS)a Directory tree generator for HDFS, like os.walk. Parameters ---------- top_path : str Root directory for tree traversal. Returns ------- Generator yielding 3-tuple (dirpath, dirnames, filename) T)r/N)r._libhdfs_walk_files_dirswalk _path_join)rtop_pathcontents directoriesfilesdirnamerrrr1ms  zHadoopFileSystem.walk)rrNNrN)F)F)__name__ __module__ __qualname____doc__rrrrrr!r&r'r)r+r-r.r1 __classcell__rrr$rrs(  rcCstddl}|dtjddr"dSdtjkrXtjdkr>t}q`dtjd}t |}nt d}| d tjd<dS) Nrzhadoop-common[^/]+.jarZ CLASSPATH HADOOP_HOMEwin32z {}/bin/hadoopZhadooputf-8) researchosenvirongetsysplatform_derive_hadoop_classpathr_hadoop_classpath_globdecode)rA classpath hadoop_binrrrrs   rcCsddl}ddtjdddf}|j||jd}|jd|j|jd }|jd |jd }d tjkrdtjd n tjdd }|dd|S)Nrfindz-Lr>z-namez*.jar)stdout)ZxargsZecho)stdinrN)trz' 'z':')rOZHADOOP_CONF_DIRz /etc/hadoop:r@) subprocessrCrDPopenPIPErN check_outputencode)rRZ find_argsrMZ xargs_echoZjarsZ hadoop_confrrrrHs  rHcCsddl}|ddf}||S)NrrKz--glob)rRrU)rLrRZhadoop_classpath_argsrrrrIs rIcCsNg}g}|D]8}t|dd}|ddkr:||q ||q ||fS)Nnamekindfile) posixpathsplitappend)r3r4r6r5cZ scrubbed_namerrrr0s   r0rcCs,tjtdddtddt|||||dS)a Connect to an HDFS cluster. All parameters are optional and should only be set if the defaults need to be overridden. Authentication should be automatic if the HDFS cluster uses Kerberos. However, if a username is specified, then the ticket cache will likely be required. Parameters ---------- host : NameNode. Set to "default" for fs.defaultFS from core-site.xml. port : NameNode's port. Set to 0 for default or logical (HA) nodes. user : Username when connecting to HDFS; None implies login user. kerb_ticket : Path to Kerberos ticket cache. extra_conf : dict, default None extra Key/Value pairs for config; Will override any hdfs-site.xml properties Notes ----- The first time you call this method, it will take longer than usual due to JNI spin-up time. Returns ------- filesystem : HadoopFileSystem z hdfs.connectrr r r rrrrr)r rrrrrr_rrrconnects r`c Cs4t"tdt|||||d}W5QRX|S)Nignorer_)r catch_warnings simplefilterr)rrrrrfsrrrrs  r)rrNNN)rrNNN)rCr[rFr Z pyarrow.utilrrZpyarrow.filesystemrZ pyarrow.liblibrrrHrIr0r`rrrrrs"  e  '