././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1708977009.4027944 remotezip-0.12.3/0000755000076500000240000000000000000000000012111 5ustar00gtstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1524600274.0 remotezip-0.12.3/LICENSE0000666000076500000240000000206300000000000013123 0ustar00gtstaffMIT License Copyright (c) 2018 Giuseppe Tribulato Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1708977009.4025302 remotezip-0.12.3/PKG-INFO0000644000076500000240000001573700000000000013223 0ustar00gtstaffMetadata-Version: 2.1 Name: remotezip Version: 0.12.3 Summary: Access zip file content hosted remotely without downloading the full file. Home-page: https://github.com/gtsystem/python-remotezip Author: Giuseppe Tribulato Author-email: gtsystem@gmail.com License: MIT Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Development Status :: 5 - Production/Stable Classifier: License :: OSI Approved :: MIT License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Description-Content-Type: text/markdown Provides-Extra: test License-File: LICENSE # remotezip [![Build Status](https://travis-ci.org/gtsystem/python-remotezip.svg?branch=master)](https://travis-ci.org/gtsystem/python-remotezip) This module provides a way to access single members of a zip file archive without downloading the full content from a remote web server. For this library to work, the web server hosting the archive needs to support the [range](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests) header. ## Installation `pip install remotezip` ## Usage ### Initialization `RemoteZip(url, ...)` To download the content, this library rely on the `requests` module. The constructor interface matches the function `requests.get` module. * **url**: Url where the zip file is located *(required)*. * **auth**: authentication credentials. * **headers**: headers to pass to the request. * **timeout**: timeout for the request. * **verify**: enable/disable certificate verification or set custom certificates location. * ... Please look at the [requests](http://docs.python-requests.org/en/master/user/quickstart/#make-a-request) documentation for futher usage details. * **initial\_buffer\_size**: How much data (in bytes) to fetch during the first connection to download the zip file central directory. If your zip file conteins a lot of files, would be a good idea to increase this parameter in order to avoid the need for further remote requests. *Default: 64kb*. * **session**: a custom session object to use for the request. * **support_suffix_range**: You can set this attribute to `False` if the remote server doesn't support suffix range (negative offset). Notice that this option will use one more HEAD request to fetch the content length. ### Class Interface `RemoteZip` is a subclass of the python standard library class `zipfile.ZipFile`, so it supports all its read methods: * `RemoteZip.close()` * `RemoteZip.getinfo(name)` * `RemoteZip.extract(member[, path[, pwd]])` * `RemoteZip.extractall([path[, members[, pwd]]])` * `RemoteZip.infolist()` * `RemoteZip.namelist()` * `RemoteZip.open(name[, mode[, pwd]])` * `RemoteZip.printdir()` * `RemoteZip.read(name[, pwd])` * `RemoteZip.testzip()` * `RemoteZip.filename` * `RemoteZip.debug` * `RemoteZip.comment` Please look at the [zipfile](https://docs.python.org/3/library/zipfile.html#zipfile-objects) documentation for usage details. **NOTE**: * `extractall()` and `testzip()` require to access the full content of the archive. If you need to use such methods, a full download of it would be probably more efficient. * `RemoteZip.open()` now supports seek operations when reading archive members. However as the content is streamed and DEFLATE format doesn't support seek natively, any negative seek operation will result in a new remote request from the beginning of the member content. This is very inefficient, the recommandation is to use `RemoteZip.extract()` and then open and operate on the extracted file. ### Examples #### List members in archive Print all members part of the archive: ```python from remotezip import RemoteZip with RemoteZip('http://.../myfile.zip') as zip: for zip_info in zip.infolist(): print(zip_info.filename) ``` #### Download a member The following example will extract the file `somefile.txt` from the archive stored at the url `http://.../myfile.zip`. ```python from remotezip import RemoteZip with RemoteZip('http://.../myfile.zip') as zip: zip.extract('somefile.txt') ``` #### S3 example If you are trying to download a member from a zip archive hosted on S3 you can use the [aws-requests-auth](https://github.com/DavidMuller/aws-requests-auth) library for that as follow: ```python from aws_requests_auth.boto_utils import BotoAWSRequestsAuth from hashlib import sha256 auth = BotoAWSRequestsAuth( aws_host='s3-eu-west-1.amazonaws.com', aws_region='eu-west-1', aws_service='s3' ) headers = {'x-amz-content-sha256': sha256('').hexdigest()} url = "https://s3-eu-west-1.amazonaws.com/.../file.zip" with RemoteZip(url, auth=auth, headers=headers) as z: zip.extract('somefile.txt') ``` ## Command line tool A simple command line tool is included in this distribution. ``` usage: remotezip [-h] [-l] [-d DIR] url [filename [filename ...]] Unzip remote files positional arguments: url Url of the zip archive filename File to extract optional arguments: -h, --help show this help message and exit -l, --list List files in the archive -d DIR, --dir DIR Extract directory, default current directory ``` #### Example ``` $ remotezip -l "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip" Length DateTime Name -------- ------------------- ------------------------ 2962 2008-07-30 13:58:46 Readme.txt 24740 2008-07-30 12:16:46 TM_WORLD_BORDERS-0.3.dbf 145 2008-03-12 13:11:54 TM_WORLD_BORDERS-0.3.prj 6478464 2008-07-30 12:16:46 TM_WORLD_BORDERS-0.3.shp 2068 2008-07-30 12:16:46 TM_WORLD_BORDERS-0.3.shx $ remotezip "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip" Readme.txt Extracting Readme.txt... ``` ## How it works This module uses the `zipfile.ZipFile` class under the hood to decode the zip file format. The `ZipFile` class is initialized with a file like object that will perform transparently the remote queries. The zip format is composed by the content of each compressed member followed by the central directory. How many requests will this module perform to download a member? * If the full archive content is smaller than **initial\_buffer\_size**, only one request will be needed. * Normally two requests are needed, one to download the central directory and one to download the archive member. * If the central directory is bigger than **initial\_buffer\_size**, a third request will be required. * If negative seek operations are used in `ZipExtFile`, each of them will result in a new request. ## Alternative modules There is a similar module available for python [pyremotezip](https://github.com/fcvarela/pyremotezip). ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1674244999.0 remotezip-0.12.3/README.md0000644000076500000240000001366700000000000013405 0ustar00gtstaff# remotezip [![Build Status](https://travis-ci.org/gtsystem/python-remotezip.svg?branch=master)](https://travis-ci.org/gtsystem/python-remotezip) This module provides a way to access single members of a zip file archive without downloading the full content from a remote web server. For this library to work, the web server hosting the archive needs to support the [range](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests) header. ## Installation `pip install remotezip` ## Usage ### Initialization `RemoteZip(url, ...)` To download the content, this library rely on the `requests` module. The constructor interface matches the function `requests.get` module. * **url**: Url where the zip file is located *(required)*. * **auth**: authentication credentials. * **headers**: headers to pass to the request. * **timeout**: timeout for the request. * **verify**: enable/disable certificate verification or set custom certificates location. * ... Please look at the [requests](http://docs.python-requests.org/en/master/user/quickstart/#make-a-request) documentation for futher usage details. * **initial\_buffer\_size**: How much data (in bytes) to fetch during the first connection to download the zip file central directory. If your zip file conteins a lot of files, would be a good idea to increase this parameter in order to avoid the need for further remote requests. *Default: 64kb*. * **session**: a custom session object to use for the request. * **support_suffix_range**: You can set this attribute to `False` if the remote server doesn't support suffix range (negative offset). Notice that this option will use one more HEAD request to fetch the content length. ### Class Interface `RemoteZip` is a subclass of the python standard library class `zipfile.ZipFile`, so it supports all its read methods: * `RemoteZip.close()` * `RemoteZip.getinfo(name)` * `RemoteZip.extract(member[, path[, pwd]])` * `RemoteZip.extractall([path[, members[, pwd]]])` * `RemoteZip.infolist()` * `RemoteZip.namelist()` * `RemoteZip.open(name[, mode[, pwd]])` * `RemoteZip.printdir()` * `RemoteZip.read(name[, pwd])` * `RemoteZip.testzip()` * `RemoteZip.filename` * `RemoteZip.debug` * `RemoteZip.comment` Please look at the [zipfile](https://docs.python.org/3/library/zipfile.html#zipfile-objects) documentation for usage details. **NOTE**: * `extractall()` and `testzip()` require to access the full content of the archive. If you need to use such methods, a full download of it would be probably more efficient. * `RemoteZip.open()` now supports seek operations when reading archive members. However as the content is streamed and DEFLATE format doesn't support seek natively, any negative seek operation will result in a new remote request from the beginning of the member content. This is very inefficient, the recommandation is to use `RemoteZip.extract()` and then open and operate on the extracted file. ### Examples #### List members in archive Print all members part of the archive: ```python from remotezip import RemoteZip with RemoteZip('http://.../myfile.zip') as zip: for zip_info in zip.infolist(): print(zip_info.filename) ``` #### Download a member The following example will extract the file `somefile.txt` from the archive stored at the url `http://.../myfile.zip`. ```python from remotezip import RemoteZip with RemoteZip('http://.../myfile.zip') as zip: zip.extract('somefile.txt') ``` #### S3 example If you are trying to download a member from a zip archive hosted on S3 you can use the [aws-requests-auth](https://github.com/DavidMuller/aws-requests-auth) library for that as follow: ```python from aws_requests_auth.boto_utils import BotoAWSRequestsAuth from hashlib import sha256 auth = BotoAWSRequestsAuth( aws_host='s3-eu-west-1.amazonaws.com', aws_region='eu-west-1', aws_service='s3' ) headers = {'x-amz-content-sha256': sha256('').hexdigest()} url = "https://s3-eu-west-1.amazonaws.com/.../file.zip" with RemoteZip(url, auth=auth, headers=headers) as z: zip.extract('somefile.txt') ``` ## Command line tool A simple command line tool is included in this distribution. ``` usage: remotezip [-h] [-l] [-d DIR] url [filename [filename ...]] Unzip remote files positional arguments: url Url of the zip archive filename File to extract optional arguments: -h, --help show this help message and exit -l, --list List files in the archive -d DIR, --dir DIR Extract directory, default current directory ``` #### Example ``` $ remotezip -l "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip" Length DateTime Name -------- ------------------- ------------------------ 2962 2008-07-30 13:58:46 Readme.txt 24740 2008-07-30 12:16:46 TM_WORLD_BORDERS-0.3.dbf 145 2008-03-12 13:11:54 TM_WORLD_BORDERS-0.3.prj 6478464 2008-07-30 12:16:46 TM_WORLD_BORDERS-0.3.shp 2068 2008-07-30 12:16:46 TM_WORLD_BORDERS-0.3.shx $ remotezip "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip" Readme.txt Extracting Readme.txt... ``` ## How it works This module uses the `zipfile.ZipFile` class under the hood to decode the zip file format. The `ZipFile` class is initialized with a file like object that will perform transparently the remote queries. The zip format is composed by the content of each compressed member followed by the central directory. How many requests will this module perform to download a member? * If the full archive content is smaller than **initial\_buffer\_size**, only one request will be needed. * Normally two requests are needed, one to download the central directory and one to download the archive member. * If the central directory is bigger than **initial\_buffer\_size**, a third request will be required. * If negative seek operations are used in `ZipExtFile`, each of them will result in a new request. ## Alternative modules There is a similar module available for python [pyremotezip](https://github.com/fcvarela/pyremotezip). ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1708977009.4023683 remotezip-0.12.3/remotezip.egg-info/0000755000076500000240000000000000000000000015621 5ustar00gtstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708977009.0 remotezip-0.12.3/remotezip.egg-info/PKG-INFO0000666000076500000240000001573700000000000016737 0ustar00gtstaffMetadata-Version: 2.1 Name: remotezip Version: 0.12.3 Summary: Access zip file content hosted remotely without downloading the full file. Home-page: https://github.com/gtsystem/python-remotezip Author: Giuseppe Tribulato Author-email: gtsystem@gmail.com License: MIT Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Development Status :: 5 - Production/Stable Classifier: License :: OSI Approved :: MIT License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Description-Content-Type: text/markdown Provides-Extra: test License-File: LICENSE # remotezip [![Build Status](https://travis-ci.org/gtsystem/python-remotezip.svg?branch=master)](https://travis-ci.org/gtsystem/python-remotezip) This module provides a way to access single members of a zip file archive without downloading the full content from a remote web server. For this library to work, the web server hosting the archive needs to support the [range](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests) header. ## Installation `pip install remotezip` ## Usage ### Initialization `RemoteZip(url, ...)` To download the content, this library rely on the `requests` module. The constructor interface matches the function `requests.get` module. * **url**: Url where the zip file is located *(required)*. * **auth**: authentication credentials. * **headers**: headers to pass to the request. * **timeout**: timeout for the request. * **verify**: enable/disable certificate verification or set custom certificates location. * ... Please look at the [requests](http://docs.python-requests.org/en/master/user/quickstart/#make-a-request) documentation for futher usage details. * **initial\_buffer\_size**: How much data (in bytes) to fetch during the first connection to download the zip file central directory. If your zip file conteins a lot of files, would be a good idea to increase this parameter in order to avoid the need for further remote requests. *Default: 64kb*. * **session**: a custom session object to use for the request. * **support_suffix_range**: You can set this attribute to `False` if the remote server doesn't support suffix range (negative offset). Notice that this option will use one more HEAD request to fetch the content length. ### Class Interface `RemoteZip` is a subclass of the python standard library class `zipfile.ZipFile`, so it supports all its read methods: * `RemoteZip.close()` * `RemoteZip.getinfo(name)` * `RemoteZip.extract(member[, path[, pwd]])` * `RemoteZip.extractall([path[, members[, pwd]]])` * `RemoteZip.infolist()` * `RemoteZip.namelist()` * `RemoteZip.open(name[, mode[, pwd]])` * `RemoteZip.printdir()` * `RemoteZip.read(name[, pwd])` * `RemoteZip.testzip()` * `RemoteZip.filename` * `RemoteZip.debug` * `RemoteZip.comment` Please look at the [zipfile](https://docs.python.org/3/library/zipfile.html#zipfile-objects) documentation for usage details. **NOTE**: * `extractall()` and `testzip()` require to access the full content of the archive. If you need to use such methods, a full download of it would be probably more efficient. * `RemoteZip.open()` now supports seek operations when reading archive members. However as the content is streamed and DEFLATE format doesn't support seek natively, any negative seek operation will result in a new remote request from the beginning of the member content. This is very inefficient, the recommandation is to use `RemoteZip.extract()` and then open and operate on the extracted file. ### Examples #### List members in archive Print all members part of the archive: ```python from remotezip import RemoteZip with RemoteZip('http://.../myfile.zip') as zip: for zip_info in zip.infolist(): print(zip_info.filename) ``` #### Download a member The following example will extract the file `somefile.txt` from the archive stored at the url `http://.../myfile.zip`. ```python from remotezip import RemoteZip with RemoteZip('http://.../myfile.zip') as zip: zip.extract('somefile.txt') ``` #### S3 example If you are trying to download a member from a zip archive hosted on S3 you can use the [aws-requests-auth](https://github.com/DavidMuller/aws-requests-auth) library for that as follow: ```python from aws_requests_auth.boto_utils import BotoAWSRequestsAuth from hashlib import sha256 auth = BotoAWSRequestsAuth( aws_host='s3-eu-west-1.amazonaws.com', aws_region='eu-west-1', aws_service='s3' ) headers = {'x-amz-content-sha256': sha256('').hexdigest()} url = "https://s3-eu-west-1.amazonaws.com/.../file.zip" with RemoteZip(url, auth=auth, headers=headers) as z: zip.extract('somefile.txt') ``` ## Command line tool A simple command line tool is included in this distribution. ``` usage: remotezip [-h] [-l] [-d DIR] url [filename [filename ...]] Unzip remote files positional arguments: url Url of the zip archive filename File to extract optional arguments: -h, --help show this help message and exit -l, --list List files in the archive -d DIR, --dir DIR Extract directory, default current directory ``` #### Example ``` $ remotezip -l "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip" Length DateTime Name -------- ------------------- ------------------------ 2962 2008-07-30 13:58:46 Readme.txt 24740 2008-07-30 12:16:46 TM_WORLD_BORDERS-0.3.dbf 145 2008-03-12 13:11:54 TM_WORLD_BORDERS-0.3.prj 6478464 2008-07-30 12:16:46 TM_WORLD_BORDERS-0.3.shp 2068 2008-07-30 12:16:46 TM_WORLD_BORDERS-0.3.shx $ remotezip "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip" Readme.txt Extracting Readme.txt... ``` ## How it works This module uses the `zipfile.ZipFile` class under the hood to decode the zip file format. The `ZipFile` class is initialized with a file like object that will perform transparently the remote queries. The zip format is composed by the content of each compressed member followed by the central directory. How many requests will this module perform to download a member? * If the full archive content is smaller than **initial\_buffer\_size**, only one request will be needed. * Normally two requests are needed, one to download the central directory and one to download the archive member. * If the central directory is bigger than **initial\_buffer\_size**, a third request will be required. * If negative seek operations are used in `ZipExtFile`, each of them will result in a new request. ## Alternative modules There is a similar module available for python [pyremotezip](https://github.com/fcvarela/pyremotezip). ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708977009.0 remotezip-0.12.3/remotezip.egg-info/SOURCES.txt0000666000076500000240000000035700000000000017516 0ustar00gtstaffLICENSE README.md remotezip.py setup.py remotezip.egg-info/PKG-INFO remotezip.egg-info/SOURCES.txt remotezip.egg-info/dependency_links.txt remotezip.egg-info/entry_points.txt remotezip.egg-info/requires.txt remotezip.egg-info/top_level.txt././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708977009.0 remotezip-0.12.3/remotezip.egg-info/dependency_links.txt0000666000076500000240000000000100000000000021673 0ustar00gtstaff ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708977009.0 remotezip-0.12.3/remotezip.egg-info/entry_points.txt0000644000076500000240000000005600000000000021120 0ustar00gtstaff[console_scripts] remotezip = remotezip:main ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708977009.0 remotezip-0.12.3/remotezip.egg-info/requires.txt0000666000076500000240000000003700000000000020225 0ustar00gtstaffrequests [test] requests_mock ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708977009.0 remotezip-0.12.3/remotezip.egg-info/top_level.txt0000666000076500000240000000001200000000000020350 0ustar00gtstaffremotezip ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708976972.0 remotezip-0.12.3/remotezip.py0000755000076500000240000002565700000000000014523 0ustar00gtstaff#!/usr/bin/env python import io import zipfile from datetime import datetime from itertools import tee import requests __all__ = ['RemoteIOError', 'RemoteZip'] class RemoteZipError(Exception): pass class OutOfBound(RemoteZipError): pass class RemoteIOError(RemoteZipError): pass class RangeNotSupported(RemoteZipError): pass class PartialBuffer: """An object with buffer-like interface but containing just a part of the data. The object allows to seek and read like this buffer contains the full data however, any attempt to read data outside the partial data is going to fail with OutOfBound error. """ def __init__(self, buffer, offset, size, stream): self.buffer = buffer if stream else io.BytesIO(buffer.read()) self._offset = offset self._size = size self._position = offset self._stream = stream def __len__(self): """Returns the data size contained in the buffer""" return self._size def __repr__(self): return "" % (self._offset, self._size, self._stream) def read(self, size=0): """Read data from the buffer from the current position""" if size == 0: size = self._offset + self._size - self._position content = self.buffer.read(size) self._position = self._offset + self.buffer.tell() return content def close(self): """Ensure memory and connections are closed""" if not self.buffer.closed: self.buffer.close() if hasattr(self.buffer, 'release_conn'): self.buffer.release_conn() def tell(self): """Returns the current position on the virtual buffer""" return self._position def seek(self, offset, whence): """Change the position on the virtual buffer""" if whence == 2: self._position = self._size + self._offset + offset elif whence == 0: self._position = offset else: self._position += offset relative_position = self._position - self._offset if relative_position < 0 or relative_position >= self._size: raise OutOfBound("Position out of buffer bound") if self._stream: buff_pos = self.buffer.tell() if relative_position < buff_pos: raise OutOfBound("Negative seek not supported") skip_bytes = relative_position - buff_pos if skip_bytes == 0: return self._position self.buffer.read(skip_bytes) else: self.buffer.seek(relative_position) return self._position class RemoteIO(io.IOBase): """Exposes a file-like interface for zip files hosted remotely. It requires the remote server to support the Range header.""" def __init__(self, fetch_fun, initial_buffer_size=64*1024): self._fetch_fun = fetch_fun self._initial_buffer_size = initial_buffer_size self.buffer = None self._file_size = None self._seek_succeeded = False self._member_position_to_size = None self._last_member_pos = None def set_position_to_size(self, position_to_size): self._member_position_to_size = position_to_size def read(self, size=0): position = self.tell() if size == 0: size = self._file_size - position if not self._seek_succeeded: if self._member_position_to_size is None: fetch_size = size stream = False else: try: fetch_size = self._member_position_to_size[position] self._last_member_pos = position except KeyError: if self._last_member_pos and self._last_member_pos < position: fetch_size = self._member_position_to_size[self._last_member_pos] fetch_size -= (position - self._last_member_pos) else: raise OutOfBound("Attempt to seek outside boundary of current zip member") stream = True self._seek_succeeded = True self.buffer.close() self.buffer = self._fetch_fun((position, position + fetch_size - 1), stream=stream) return self.buffer.read(size) def seekable(self): return True def seek(self, offset, whence=0): if whence == 2 and self._file_size is None: size = self._initial_buffer_size self.buffer = self._fetch_fun((-size, None), stream=False) self._file_size = len(self.buffer) + self.buffer.tell() try: pos = self.buffer.seek(offset, whence) self._seek_succeeded = True return pos except OutOfBound: self._seek_succeeded = False return self.tell() # we ignore the issue here, we will check if buffer is fine during read def tell(self): return self.buffer.tell() def close(self): if self.buffer: self.buffer.close() self.buffer = None class RemoteFetcher: """Represent a remote file to be fetched in parts""" def __init__(self, url, session=None, support_suffix_range=True, **kwargs): self._kwargs = kwargs self._url = url self._session = session self._support_suffix_range = support_suffix_range @staticmethod def parse_range_header(content_range_header): range = content_range_header[6:].split("/")[0] if range.startswith("-"): return int(range), None range_min, range_max = range.split("-") return int(range_min), int(range_max) if range_max else None @staticmethod def build_range_header(range_min, range_max): if range_max is None: return "bytes=%s%s" % (range_min, '' if range_min < 0 else '-') return "bytes=%s-%s" % (range_min, range_max) def _request(self, kwargs): if self._session: res = self._session.get(self._url, stream=True, **kwargs) else: res = requests.get(self._url, stream=True, **kwargs) res.raise_for_status() if 'Content-Range' not in res.headers: raise RangeNotSupported("The server doesn't support range requests") return res.raw, res.headers['Content-Range'] def prepare_request(self, data_range=None): kwargs = dict(self._kwargs) kwargs['headers'] = headers = dict(kwargs.get('headers', {})) if data_range is not None: headers['Range'] = self.build_range_header(*data_range) return kwargs def get_file_size(self): if self._session: res = self._session.head(self._url, **self.prepare_request()) else: res = requests.head(self._url, **self.prepare_request()) try: res.raise_for_status() return int(res.headers['Content-Length']) except IOError as e: raise RemoteIOError(str(e)) except KeyError: raise RemoteZipError("Cannot get file size: Content-Length header missing") def fetch(self, data_range, stream=False): """Fetch a part of a remote file""" # Handle the case suffix range request is not supported. Fixes #15 if data_range[0] < 0 and data_range[1] is None and not self._support_suffix_range: size = self.get_file_size() data_range = (max(0, size + data_range[0]), size - 1) kwargs = self.prepare_request(data_range) try: res, range_header = self._request(kwargs) range_min, range_max = self.parse_range_header(range_header) return PartialBuffer(res, range_min, range_max - range_min + 1, stream) except IOError as e: raise RemoteIOError(str(e)) def pairwise(iterable): # pairwise('ABCDEFG') --> AB BC CD DE EF FG a, b = tee(iterable) next(b, None) return zip(a, b) class RemoteZip(zipfile.ZipFile): def __init__(self, url, initial_buffer_size=64*1024, session=None, fetcher=RemoteFetcher, support_suffix_range=True, **kwargs): fetcher = fetcher(url, session, support_suffix_range=support_suffix_range, **kwargs) rio = RemoteIO(fetcher.fetch, initial_buffer_size) super(RemoteZip, self).__init__(rio) rio.set_position_to_size(self._get_position_to_size()) def _get_position_to_size(self): ilist = [info.header_offset for info in self.infolist()] if len(ilist) == 0: return {} ilist.sort() ilist.append(self.start_dir) return {a: b-a for a, b in pairwise(ilist)} def size(self): return self.fp._file_size if self.fp else 0 def _list_files(url, support_suffix_range, filenames): with RemoteZip(url, headers={'User-Agent': 'remotezip'}, support_suffix_range=support_suffix_range) as zip: if len(filenames) == 0: filenames = zip.namelist() data = [] for fname in filenames: zinfo = zip.getinfo(fname) dt = datetime(*zinfo.date_time) data.append((zinfo.file_size, dt.strftime('%Y-%m-%d %H:%M:%S'), zinfo.filename)) _printTable(data, ('Length', 'DateTime', 'Name'), '><<') def _printTable(data, header, align): # get max col width & prepare formatting string col_w = [len(col) for col in header] for row in data: col_w = [max(w, len(str(x))) for w, x in zip(col_w, row)] fmt = ' '.join('{{:{}{}}}'.format(a, w) for w, a in zip(col_w, align + '<' * 99)) # print table print(fmt.format(*header).rstrip()) print(fmt.format(*['-' * w for w in col_w])) for row in data: print(fmt.format(*row).rstrip()) print() def _extract_files(url, support_suffix_range, filenames, path): with RemoteZip(url, support_suffix_range=support_suffix_range) as zip: if len(filenames) == 0: filenames = zip.namelist() for fname in filenames: print('Extracting {0}...'.format(fname)) zip.extract(fname, path=path) def main(): import argparse import os parser = argparse.ArgumentParser(description="Unzip remote files") parser.add_argument('url', help='Url of the zip archive') parser.add_argument('filename', nargs='*', help='File to extract') parser.add_argument('-l', '--list', action='store_true', help='List files in the archive') parser.add_argument('-d', '--dir', default=os.getcwd(), help='Extract directory, default current directory') parser.add_argument('--disable-suffix-range-support', action='store_true', help='Use when remote server does not support suffix range (negative offset)') args = parser.parse_args() support_suffix_range = not args.disable_suffix_range_support if args.list: _list_files(args.url, support_suffix_range, args.filename) else: _extract_files(args.url, support_suffix_range, args.filename, args.dir) if __name__ == "__main__": main() ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1708977009.402834 remotezip-0.12.3/setup.cfg0000644000076500000240000000004600000000000013732 0ustar00gtstaff[egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1708976972.0 remotezip-0.12.3/setup.py0000644000076500000240000000260500000000000013626 0ustar00gtstafffrom setuptools import setup with open("README.md") as f: description = f.read() setup( name='remotezip', version='0.12.3', author='Giuseppe Tribulato', author_email='gtsystem@gmail.com', py_modules=['remotezip'], url='https://github.com/gtsystem/python-remotezip', license='MIT', description='Access zip file content hosted remotely without downloading the full file.', long_description=description, long_description_content_type="text/markdown", install_requires=["requests"], extras_require={ "test": ["requests_mock"], }, entry_points={ 'console_scripts': ['remotezip = remotezip:main'] }, test_suite='test_remotezip', classifiers=[ 'Intended Audience :: Developers', 'Development Status :: 5 - Production/Stable', 'License :: OSI Approved :: MIT License', 'Programming Language :: Python', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', 'Programming Language :: Python :: 3.9', 'Programming Language :: Python :: 3.10' ] )