U C^1@sddlmZmZmZddlZddlZddlmZddlm Z ddl m Z ddl m Z mZedZedZGd d d e ZGd d d e ZdddZGdddeZdS))print_functiondivisionabsolute_importN)urlparse)AbstractFileSystem)AbstractBufferedFile)tokenizeDEFAULT_BLOCK_SIZEz%]*?\s+)?href=(["'])(.*?)\1z&(http[s]?://[-a-zA-Z0-9@:%_+.~#?&/=]+)c@sfeZdZdZdZdddZeddZdd d Zd d Z d dZ ddZ dddZ ddZ ddZdS)HTTPFileSystema2 Simple File-System for fetching data via HTTP(S) ``ls()`` is implemented by loading the parent page and doing a regex match on the result. If simple_link=True, anything of the form "http(s)://server.com/stuff?thing=other"; otherwise only links within HTML href tags will be used. /TNcKs<t||dk r|nt|_||_||_||_t|_ dS)a Parameters ---------- block_size: int Blocks to read bytes; if 0, will default to raw requests file-like objects instead of HTTPFile instances simple_links: bool If True, will consider both HTML tags and anything that looks like a URL; if False, will consider only the former. same_scheme: True When doing ls/glob, if this is True, only consider paths that have http/https matching the input URLs. size_policy: this argument is deprecated storage_options: key-value May be credentials, e.g., `{'auth': ('username', 'pword')}` or any other parameters passed on to requests N) r__init__r block_size simple_links same_schemakwargsrequestsSessionsession)selfrr Z same_scheme size_policyZstorage_optionsr?/tmp/pip-install-6_kvzl1k/fsspec/fsspec/implementations/http.pyr s  zHTTPFileSystem.__init__cCs|S)z7 For HTTP, we always want to keep the full URL r)clspathrrr_strip_protocol;szHTTPFileSystem._strip_protocolc Csn|jj|f|j}|jr2t|jt|j}n t|j}t}t |}|D]}t |t rf|d}| dr|j r|ddd|dddkr||n$|dd |ddr||qP| drt|dkr||jd|j|qP|dkrP|d|d|dgqP|sJ|drJ|j|dd d S|r^d d |DStt|SdS) Nhttp:rhttpsr z://)z..z../T)detailcSs&g|]}|d|drdnddqS)Nr directoryfilenamesizetype)endswith).0urrr as z%HTTPFileSystem.ls..)rgetrrex2findalltextexsetr isinstancetuple startswithrsplitaddreplacelenschemenetlocjoinrstriplstripr&lslistsorted)rurlrrlinksoutpartslrrrr<Cs8        " zHTTPFileSystem.lscCstj|f|j}||jSN)rr*rraise_for_statuscontent)rr?r@rrrcatlszHTTPFileSystem.catcCstdS)z7Make any intermediate directories to make path writableN)NotImplementedErrorrr?rrrmkdirsqszHTTPFileSystem.mkdirscCsP|j}d|d<z |jj|f|}||jWStjk rJYdSXdS)NTstreamF)rcopyrr*closeokr HTTPError)rrrr@rrrexistsus zHTTPFileSystem.existsrbc Ks|dkr t|dk r|n|j}|j}|||rTt|||j|f||d|Sd|d<|jj|f|}|d|j _ |j SdS)aMake a file-like object Parameters ---------- path: str Full URL with protocol mode: string must be "rb" block_size: int or None Bytes to download in one request; use instance value if None. If zero, will return a streaming Requests file-like instance. kwargs: key-value Any other parameters, passed to requests calls rRN)mode cache_optionsTrL) rIr rrMupdateHTTPFilerr*rFrawdecode_content) rrrSr Z autocommitrTrkwr@rrr_opens,   zHTTPFileSystem._opencCst||j|jS)z;Unique identifier; assume HTTP files are static, unchanging)rrprotocolrJrrrukeyszHTTPFileSystem.ukeyc Ksfd}dD]<}z"t||j|f|j}|r,WqVWqtk rBYqXq|dkrVt|||p^dddS)aHGet info of URL Tries to access location via HEAD, and then GET methods, but does not fetch the data. It is possible that the server does not supply any size information, in which case size will be given as None (and certain operations on the corresponding file will not work). F)headr*Nr!r") file_sizerr ExceptionFileNotFoundError)rr?rr$policyrrrinfos  zHTTPFileSystem.info)TNTN)T)rRNNN)__name__ __module__ __qualname____doc__sepr classmethodrr<rHrKrQrZr\rbrrrrr s(    )  -r cs@eZdZdZdfdd Zdfdd Zd d Zd d ZZS)rVa A file-like object pointing to a remove HTTP(S) resource Supports only reading, with read-ahead of a predermined block-size. In the case that the server does not supply the filesize, only reading of the complete file in one go is supported. Parameters ---------- url: str Full URL of the remote resource, including the protocol session: requests.Session or None All calls will be made within this session, to avoid restarting connections where the server allows this block_size: int or None The amount of read-ahead to do, in bytes. Default is 5MB, or the value configured for the FileSystem creating this file size: None or int If given, this is the size of the file in bytes, and we don't attempt to call the server to find the value. kwargs: all other key-values are passed to requests calls. NrRbytesc sv|dkrtd||_|dk r"|nt|_|dk rB||dd|_tjf||||||d| |jpl|j |j _dS)NrRzFile mode not supportedr!r")fsrrSr cache_typerT) rIr?rrrdetailssuperr r$ blocksizecache) rrjr?rr rSrkrTr$r __class__rrr s"  zHTTPFile.__init__cst|dkr|jdks2||jp|ks2|jr:|j|jkr:||jdkrV|dkrh|nt|j|j|}t|S)a5Read bytes from file Parameters ---------- length: int Read up to this many bytes. If negative, read all content to end of file. If the server has not supplied the filesize, attempting to read only part of the data will raise a ValueError. rN)locr$rn _fetch_allminrmread)rlengthrprrrvs     z HTTPFile.readcCsFt|jtsB|jj|jf|j}||j}t||_t ||_ dS)zRead whole file in one shot, without caching This is only called when position is still at zero, and read() is called without a byte-count. N) r0roAllBytesrr*r?rrFrGr6r$)rr@rBrrrrts   zHTTPFile._fetch_allc Cs|j}|di}d||df|d<|jj|jf|dd|}|jdkrTdS||jd krn|j}nd |j krt |j d }|||kr|j}nt d |||fnjd }g}|j d dD]J}|r| ||t|7}|||krt d|||fqqqd|}|S)a3Download a block of data The expectation is that the server returns only the requested bytes, with HTTP code 206. If this is not the case, we first check the headers, and then stream the output - if the data size is bigger than we requested, an exception is raised. headersz bytes=%i-%irZRangeT)ryrLiContent-Lengthz'Got more bytes (%i) than requested (%i)ri) chunk_sizez/Got more bytes so far (>%i) than requested (%i))rrMpoprr*r? status_coderFrGryint ValueError iter_contentappendr6r9) rstartendrryr@rBZclchunkrrr _fetch_range%s>          zHTTPFile._fetch_range)NNrRriNN)rr) rcrdrerfr rvrtr __classcell__rrrprrVs rVr]cKs|}|dd}|di}d|d<|p6t}|dkrX|j|fd|i|}n4|dkrd|d<|j|fd|i|}n td |d |jkrt|jd Sd |jkrt|jd  d d SdS)zCall HEAD on the server to get file size Default operation is to explicitly allow redirects and use encoding 'identity' (no compression) to get the true size of the target. allow_redirectsTryidentityzAccept-Encodingr]r*rLz+size_policy must be "head" or "get", got %sr|z Content-Ranger rN) rMr~r*rrr] TypeErrorryrr3)r?rrrarr]r@rrrr^Ts     r^c@s eZdZdZddZddZdS)rxz%Cache entire contents of a remote URLcCs ||_dSrEdata)rrrrrr oszAllBytes.__init__cCs|j||SrEr)rrrrrr_fetchrszAllBytes._fetchN)rcrdrerfr rrrrrrxlsrx)Nr]) __future__rrrrer urllib.parserZfsspecrZ fsspec.specrZ fsspec.utilsrr compiler.r+r rVr^objectrxrrrrs     ;