././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1701984661.446989 reprounzip-1.3/0000775000175000017500000000000014534434625013176 5ustar00remramremram././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1642103173.0 reprounzip-1.3/LICENSE.txt0000644000175000017500000000274114170100605015003 0ustar00remramremramCopyright (C) 2014, New York University All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1519237331.0 reprounzip-1.3/MANIFEST.in0000664000175000017500000000004713243334323014723 0ustar00remramremraminclude README.rst include LICENSE.txt ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1701984661.446989 reprounzip-1.3/PKG-INFO0000644000175000017500000000573214534434625014300 0ustar00remramremramMetadata-Version: 2.1 Name: reprounzip Version: 1.3 Summary: Linux tool enabling reproducible experiments (unpacker) Home-page: https://www.reprozip.org/ Author: Remi Rampin, Fernando Chirigati, Dennis Shasha, Juliana Freire Author-email: reprozip@nyu.edu Maintainer: Remi Rampin Maintainer-email: remi@rampin.org License: BSD-3-Clause Project-URL: Documentation, https://docs.reprozip.org/ Project-URL: Examples, https://examples.reprozip.org/ Project-URL: Source, https://github.com/VIDA-NYU/reprozip Project-URL: Bug Tracker, https://github.com/VIDA-NYU/reprozip/issues Project-URL: Chat, https://riot.im/app/#/room/#reprozip:matrix.org Project-URL: Changelog, https://github.com/VIDA-NYU/reprozip/blob/1.x/CHANGELOG.md Keywords: reprozip,reprounzip,reproducibility,provenance,vida,nyu Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: BSD License Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Topic :: Scientific/Engineering Classifier: Topic :: System :: Archiving License-File: LICENSE.txt Requires-Dist: PyYAML Requires-Dist: rpaths>=0.8 Requires-Dist: usagestats>=0.3 Requires-Dist: requests Requires-Dist: distro Requires-Dist: pyelftools Provides-Extra: all Requires-Dist: reprounzip-vagrant>=1.0; extra == "all" Requires-Dist: reprounzip-docker>=1.0; extra == "all" Requires-Dist: reprounzip-vistrails>=1.0; extra == "all" ReproZip project ================ `ReproZip `__ is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science. It tracks operating system calls and creates a bundle that contains all the binaries, files and dependencies required to run a given command on the author's computational environment (packing step). A reviewer can then extract the experiment in his environment to reproduce the results (unpacking step). reprounzip ---------- This is the component responsible for the unpacking step on Linux distributions. Please refer to `reprozip `__, `reprounzip-vagrant `_, and `reprounzip-docker `_ for other components and plugins. A GUI is available at `reprounzip-qt `_. Additional Information ---------------------- For more detailed information, please refer to our `website `_, as well as to our `documentation `_. ReproZip is currently being developed at `NYU `_. The team includes: * `Fernando Chirigati `_ * `Juliana Freire `_ * `Remi Rampin `_ * `Dennis Shasha `_ * `Vicky Rampin `_ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1682655966.0 reprounzip-1.3/README.rst0000664000175000017500000000300414422645336014661 0ustar00remramremramReproZip project ================ `ReproZip `__ is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science. It tracks operating system calls and creates a bundle that contains all the binaries, files and dependencies required to run a given command on the author's computational environment (packing step). A reviewer can then extract the experiment in his environment to reproduce the results (unpacking step). reprounzip ---------- This is the component responsible for the unpacking step on Linux distributions. Please refer to `reprozip `__, `reprounzip-vagrant `_, and `reprounzip-docker `_ for other components and plugins. A GUI is available at `reprounzip-qt `_. Additional Information ---------------------- For more detailed information, please refer to our `website `_, as well as to our `documentation `_. ReproZip is currently being developed at `NYU `_. The team includes: * `Fernando Chirigati `_ * `Juliana Freire `_ * `Remi Rampin `_ * `Dennis Shasha `_ * `Vicky Rampin `_ ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1701984661.438989 reprounzip-1.3/reprounzip/0000775000175000017500000000000014534434625015413 5ustar00remramremram././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/__init__.py0000664000175000017500000000032014505567114017515 0ustar00remramremramtry: # pragma: no cover __import__('pkg_resources').declare_namespace(__name__) except ImportError: # pragma: no cover from pkgutil import extend_path __path__ = extend_path(__path__, __name__) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/common.py0000664000175000017500000007734614505567114017274 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. # This file is shared: # reprozip/reprozip/common.py # reprounzip/reprounzip/common.py """Common functions between reprozip and reprounzip. This module contains functions that are specific to the reprozip software and its data formats, but that are shared between the reprozip and reprounzip packages. Because the packages can be installed separately, these functions are in a separate module which is duplicated between the packages. As long as these are small in number, they are not worth putting in a separate package that reprozip and reprounzip would both depend on. """ from __future__ import division, print_function, unicode_literals import atexit import contextlib import copy from datetime import datetime from distutils.version import LooseVersion import functools import gzip import logging import logging.handlers import os from rpaths import PosixPath, Path import sys import tarfile import usagestats import yaml import zipfile from .utils import iteritems, itervalues, unicode_, stderr, UniqueNames, \ escape, optional_return_type, isodatetime, hsize, join_root, copyfile logger = logging.getLogger(__name__.split('.', 1)[0]) FILE_READ = 0x01 FILE_WRITE = 0x02 FILE_WDIR = 0x04 FILE_STAT = 0x08 FILE_LINK = 0x10 FILE_SOCKET = 0x20 class File(object): """A file, used at some point during the experiment. """ comment = None def __init__(self, path, size=None): self.path = path self.size = size def __eq__(self, other): return (isinstance(other, File) and self.path == other.path) def __ne__(self, other): return not self.__eq__(other) def __hash__(self): return hash(self.path) class Package(object): """A distribution package, containing a set of files. """ def __init__(self, name, version, files=None, packfiles=True, size=None): self.name = name self.version = version self.files = list(files) if files is not None else [] self.packfiles = packfiles self.size = size def __eq__(self, other): return (isinstance(other, Package) and self.name == other.name and self.version == other.version) def __ne__(self, other): return not self.__eq__(other) def add_file(self, file_): self.files.append(file_) def __unicode__(self): return '%s (%s)' % (self.name, self.version) __str__ = __unicode__ # Pack format history: # 1: used by reprozip 0.2 through 0.7. Single tar.gz file, metadata under # METADATA/, data under DATA/ # 2: pack is usually not compressed, metadata under METADATA/, data in another # DATA.tar.gz (files inside it still have the DATA/ prefix for ease-of-use # in unpackers) # # Pack metadata history: # 0.2: used by reprozip 0.2 # 0.2.1: # config: comments directories as such in config # trace database: adds executed_files.workingdir, adds processes.exitcode # data: packs dynamic linkers # 0.3: # config: don't list missing (unpacked) files in config # trace database: adds opened_files.is_directory # 0.3.1: no change # 0.3.2: no change # 0.4: # config: adds input_files, output_files, lists parent directories # 0.4.1: no change # 0.5: no change # 0.6: no change # 0.7: # moves input_files and output_files from run to global scope # adds processes.is_thread column to trace database # 0.8: adds 'id' field to run class RPZPack(object): """Encapsulates operations on the RPZ pack format. """ data = zip = tar = None def __init__(self, pack): self.pack = Path(pack) if self._open_tar(): pass elif self._open_zip(): pass else: raise ValueError("File doesn't appear to be an RPZ pack") def _open_tar(self): try: self.tar = tarfile.open(str(self.pack), 'r:*') except tarfile.TarError: return False try: f = self.tar.extractfile('METADATA/version') except KeyError: raise ValueError("Invalid ReproZip file") version = f.read() f.close() if version.startswith(b'REPROZIP VERSION '): try: version = int(version[17:].rstrip()) except ValueError: version = None if version in (1, 2): self.version = version self.data_prefix = PosixPath(b'DATA') else: raise ValueError( "Unknown format version %r (maybe you should upgrade " "reprounzip? I only know versions 1 and 2" % version) else: raise ValueError("File doesn't appear to be an RPZ pack") if self.version == 1: self.data = self.tar elif version == 2: self.data = tarfile.open( fileobj=self.tar.extractfile('DATA.tar.gz'), mode='r:*') else: assert False return True def _open_zip(self): try: self.zip = zipfile.ZipFile(str(self.pack)) except zipfile.BadZipfile: return False try: f = self.zip.open('METADATA/version') except KeyError: raise ValueError("Invalid ReproZip file") version = f.read() f.close() if version.startswith(b'REPROZIP VERSION '): try: version = int(version[17:].rstrip()) except ValueError: version = None if version == 1: raise ValueError("Format version 1 is not accepted for ZIP") elif version == 2: self.version = 2 self.data_prefix = PosixPath(b'DATA') else: raise ValueError( "Unknown format version %r (maybe you should upgrade " "reprounzip? I only know versions 1 and 2" % version) else: raise ValueError("File doesn't appear to be an RPZ pack") if sys.version_info < (3, 7): # zip.open() doesn't return a seekable file object before 3.6 # Extract to a temporary file instead fd, temporary_data = Path.tempfile( prefix='reprounzip_data_', suffix='.zip', ) os.close(fd) self._extract_file('DATA.tar.gz', temporary_data) self.data = tarfile.open(str(temporary_data), mode='r:*') atexit.register(os.remove, temporary_data.path) else: self.data = tarfile.open(fileobj=self.zip.open('DATA.tar.gz'), mode='r:*') return True def remove_data_prefix(self, path): if not isinstance(path, PosixPath): path = PosixPath(path) components = path.components[1:] if not components: return path.__class__('') return path.__class__(*components) def open_config(self): """Gets the configuration file. """ if self.tar is not None: return self.tar.extractfile('METADATA/config.yml') else: return self.zip.open('METADATA/config.yml') def extract_config(self, target): """Extracts the config to the specified path. It is up to the caller to remove that file once done. """ self._extract_file('METADATA/config.yml', target) def _extract_file(self, name, target): if self.tar is not None: member = copy.copy(self.tar.getmember(name)) member.name = str(target.components[-1]) self.tar.extract(member, path=str(Path.cwd() / target.parent)) else: member = copy.copy(self.zip.getinfo(name)) member.filename = str(target.components[-1]) self.zip.extract(member, path=str(Path.cwd() / target.parent)) target.chmod(0o644) assert target.is_file() def _extract_file_gz(self, name, target): if self.tar is not None: f_in = self.tar.extractfile(name) else: f_in = self.zip.open(name) f_in_gz = gzip.open(f_in) f_out = target.open('wb') try: chunk = f_in_gz.read(4096) while len(chunk) == 4096: f_out.write(chunk) chunk = f_in_gz.read(4096) if chunk: f_out.write(chunk) finally: f_out.close() f_in_gz.close() f_in.close() target.chmod(0o644) @contextlib.contextmanager def with_config(self): """Context manager that extracts the config to a temporary file. """ fd, tmp = Path.tempfile(prefix='reprounzip_') os.close(fd) self.extract_config(tmp) yield tmp tmp.remove() def extract_trace(self, target): """Extracts the trace database to the specified path. It is up to the caller to remove that file once done. """ target = Path(target) if self.version == 2: try: if self.tar is not None: self.tar.getmember('METADATA/trace.sqlite3.gz') else: self.zip.getinfo('METADATA/trace.sqlite3.gz') except KeyError: pass else: self._extract_file_gz('METADATA/trace.sqlite3.gz', target) return elif self.version != 2: assert False self._extract_file('METADATA/trace.sqlite3', target) @contextlib.contextmanager def with_trace(self): """Context manager extracting the trace database to a temporary file. """ fd, tmp = Path.tempfile(prefix='reprounzip_') os.close(fd) self.extract_trace(tmp) yield tmp tmp.remove() def list_data(self): """Returns tarfile.TarInfo objects for all the data paths. """ return [copy.copy(m) for m in self.data.getmembers() if m.name.startswith('DATA/')] def data_filenames(self): """Returns a set of filenames for all the data paths. Those paths begin with a slash / and the 'DATA' prefix has been removed. """ return set(PosixPath(m.name[4:]) for m in self.data.getmembers() if m.name.startswith('DATA/')) def get_data(self, path): """Returns a tarfile.TarInfo object for the data path. Raises KeyError if no such path exists. """ path = PosixPath(path) path = join_root(PosixPath(b'DATA'), path) return copy.copy(self.data.getmember(path)) def extract_data(self, root, members): """Extracts the given members from the data tarball. The members must come from get_data(). """ # Check for CVE-2007-4559 abs_root = root.absolute() for member in members: member_path = (root / member.name).absolute() if not member_path.lies_under(abs_root): raise ValueError("Invalid path in data tar") self.data.extractall(str(root), members) def copy_data_tar(self, target): """Copies the file in which the data lies to the specified destination. """ if self.tar is not None: if self.version == 1: self.pack.copyfile(target) elif self.version == 2: with target.open('wb') as fp: data = self.tar.extractfile('DATA.tar.gz') copyfile(data, fp) data.close() else: with target.open('wb') as fp: data = self.zip.open('DATA.tar.gz') copyfile(data, fp) data.close() def extensions(self): """Get a list of extensions present in this pack. """ extensions = set() if self.tar is not None: for m in self.tar.getmembers(): if m.name.startswith('EXTENSIONS/'): name = m.name[11:] if '/' in name: name = name[:name.index('/')] if name: extensions.add(name) else: for m in self.zip.infolist(): if m.filename.startswith('EXTENSIONS/'): name = m.filename[11:] if '/' in name: name = name[:name.index('/')] if name: extensions.add(name) return extensions def close(self): if self.data is not self.tar: self.data.close() if self.tar is not None: self.tar.close() elif self.zip is not None: self.zip.close() self.data = self.zip = self.tar = None class InvalidConfig(ValueError): """Configuration file is invalid. """ def read_files(files, File=File): if files is None: return [] return [File(PosixPath(f)) for f in files] def read_packages(packages, File=File, Package=Package): if packages is None: return [] new_pkgs = [] for pkg in packages: pkg['files'] = read_files(pkg['files'], File) new_pkgs.append(Package(**pkg)) return new_pkgs Config = optional_return_type(['runs', 'packages', 'other_files'], ['inputs_outputs', 'additional_patterns', 'format_version']) @functools.total_ordering class InputOutputFile(object): def __init__(self, path, read_runs, write_runs): self.path = path self.read_runs = read_runs self.write_runs = write_runs def __eq__(self, other): return ((self.path, self.read_runs, self.write_runs) == (other.path, other.read_runs, other.write_runs)) def __lt__(self, other): return self.path < other.path def __repr__(self): return "" % ( self.path, self.read_runs, self.write_runs) def load_iofiles(config, runs): """Loads the inputs_outputs part of the configuration. This tests for duplicates, merge the lists of executions, and optionally loads from the runs for reprozip < 0.7 compatibility. """ files_list = config.get('inputs_outputs') or [] # reprozip < 0.7 compatibility: read input_files and output_files from runs if 'inputs_outputs' not in config: for i, run in enumerate(runs): for rkey, wkey in (('input_files', 'read_by_runs'), ('output_files', 'written_by_runs')): for k, p in iteritems(run.pop(rkey, {})): files_list.append({'name': k, 'path': p, wkey: [i]}) files = {} # name:str: InputOutputFile paths = {} # path:PosixPath: name:str required_keys = set(['name', 'path']) optional_keys = set(['read_by_runs', 'written_by_runs']) uniquenames = UniqueNames() for i, f in enumerate(files_list): keys = set(f) if (not keys.issubset(required_keys | optional_keys) or not keys.issuperset(required_keys)): raise InvalidConfig("File #%d has invalid keys") name = f['name'] path = PosixPath(f['path']) readers = sorted(f.get('read_by_runs', [])) writers = sorted(f.get('written_by_runs', [])) if ( not isinstance(readers, (tuple, list)) or not all(isinstance(e, int) for e in readers) ): raise InvalidConfig("read_by_runs should be a list of integers") if ( not isinstance(writers, (tuple, list)) or not all(isinstance(e, int) for e in writers) ): raise InvalidConfig("written_by_runs should be a list of integers") if name in files: if files[name].path != path: old_name, name = name, uniquenames(name) logger.warning("File name appears multiple times: %s\n" "Using name %s instead", old_name, name) else: uniquenames.insert(name) if path in paths: if paths[path] == name: logger.warning("File appears multiple times: %s", name) else: logger.warning("Two files have the same path (but different " "names): %s, %s\nUsing name %s", name, paths[path], paths[path]) name = paths[path] files[name].read_runs.update(readers) files[name].write_runs.update(writers) else: paths[path] = name files[name] = InputOutputFile(path, readers, writers) return files def load_config(filename, canonical, File=File, Package=Package): """Loads a YAML configuration file. `File` and `Package` parameters can be used to override the classes that will be used to hold files and distribution packages; useful during the packing step. `canonical` indicates whether a canonical configuration file is expected (in which case the ``additional_patterns`` section is not accepted). Note that this changes the number of returned values of this function. """ with filename.open(encoding='utf-8') as fp: config = yaml.safe_load(fp) ver = LooseVersion(config['version']) keys_ = set(config) if 'version' not in keys_: raise InvalidConfig("Missing version") # Accepts versions from 0.2 to 0.8 inclusive elif not LooseVersion('0.2') <= ver < LooseVersion('0.9'): pkgname = (__package__ or __name__).split('.', 1)[0] raise InvalidConfig("Loading configuration file in unknown format %s; " "this probably means that you should upgrade " "%s" % (ver, pkgname)) unknown_keys = keys_ - set(['pack_id', 'version', 'runs', 'inputs_outputs', 'packages', 'other_files', 'additional_patterns', # Deprecated 'input_files', 'output_files']) if unknown_keys: logger.warning("Unrecognized sections in configuration: %s", ', '.join(unknown_keys)) runs = config.get('runs') or [] packages = read_packages(config.get('packages'), File, Package) other_files = read_files(config.get('other_files'), File) inputs_outputs = load_iofiles(config, runs) # reprozip < 0.7 compatibility: set inputs/outputs on runs (for plugins) for i, run in enumerate(runs): run['input_files'] = dict((n, f.path) for n, f in iteritems(inputs_outputs) if i in f.read_runs) run['output_files'] = dict((n, f.path) for n, f in iteritems(inputs_outputs) if i in f.write_runs) # reprozip < 0.8 compatibility: assign IDs to runs for i, run in enumerate(runs): if run.get('id') is None: run['id'] = "run%d" % i record_usage_package(runs, packages, other_files, inputs_outputs, pack_id=config.get('pack_id')) kwargs = {'format_version': ver, 'inputs_outputs': inputs_outputs} if canonical: if 'additional_patterns' in config: raise InvalidConfig("Canonical configuration file shouldn't have " "additional_patterns key anymore") else: kwargs['additional_patterns'] = config.get('additional_patterns') or [] return Config(runs, packages, other_files, **kwargs) def write_file(fp, fi, indent=0): fp.write("%s - \"%s\"%s\n" % ( " " * indent, escape(unicode_(fi.path)), ' # %s' % fi.comment if fi.comment is not None else '')) def write_package(fp, pkg, indent=0): indent_str = " " * indent fp.write("%s - name: \"%s\"\n" % (indent_str, escape(pkg.name))) fp.write("%s version: \"%s\"\n" % (indent_str, escape(pkg.version))) if pkg.size is not None: fp.write("%s size: %d\n" % (indent_str, pkg.size)) fp.write("%s packfiles: %s\n" % (indent_str, 'true' if pkg.packfiles else 'false')) fp.write("%s files:\n" "%s # Total files used: %s\n" % ( indent_str, indent_str, hsize(sum(fi.size for fi in pkg.files if fi.size is not None)))) if pkg.size is not None: fp.write("%s # Installed package size: %s\n" % ( indent_str, hsize(pkg.size))) for fi in sorted(pkg.files, key=lambda fi_: fi_.path): write_file(fp, fi, indent + 1) def save_config(filename, runs, packages, other_files, reprozip_version, inputs_outputs=None, canonical=False, pack_id=None): """Saves the configuration to a YAML file. `canonical` indicates whether this is a canonical configuration file (no ``additional_patterns`` section). """ dump = lambda x: yaml.safe_dump(x, encoding='utf-8', allow_unicode=True) with filename.open('w', encoding='utf-8', newline='\n') as fp: # Writes preamble fp.write("""\ # ReproZip configuration file # This file was generated by reprozip {version} at {date} {what} # Run info{pack_id} version: "{format!s}" """.format(pack_id=(('\npack_id: "%s"' % pack_id) if pack_id is not None else ''), version=escape(reprozip_version), format='0.8', date=isodatetime(), what=("# It was generated by the packer and you shouldn't need to " "edit it" if canonical else "# You might want to edit this file before running the " "packer\n# See 'reprozip pack -h' for help"))) fp.write("runs:\n") for i, run in enumerate(runs): # Remove reprozip < 0.7 compatibility fields run = dict((k, v) for k, v in iteritems(run) if k not in ('input_files', 'output_files')) fp.write("# Run %d\n" % i) fp.write(dump([run]).decode('utf-8')) fp.write("\n") fp.write("""\ # Input and output files # Inputs are files that are only read by a run; reprounzip can replace these # files on demand to run the experiment with custom data. # Outputs are files that are generated by a run; reprounzip can extract these # files from the experiment on demand, for the user to examine. # The name field is the identifier the user will use to access these files. inputs_outputs:""") for n, f in iteritems(inputs_outputs): fp.write("""\ - name: {name} path: {path} written_by_runs: {writers} read_by_runs: {readers}""".format(name=n, path=unicode_(f.path), readers=repr(f.read_runs), writers=repr(f.write_runs))) fp.write("""\ # Files to pack # All the files below were used by the program; they will be included in the # generated package # These files come from packages; we can thus choose not to include them, as it # will simply be possible to install that package on the destination system # They are included anyway by default packages: """) # Writes files for pkg in sorted(packages, key=lambda p: p.name): write_package(fp, pkg) fp.write("""\ # These files do not appear to come with an installed package -- you probably # want them packed other_files: """) for f in sorted(other_files, key=lambda fi: fi.path): write_file(fp, f) if not canonical: fp.write("""\ # If you want to include additional files in the pack, you can list additional # patterns of files that will be included additional_patterns: # Examples: # - /etc/apache2/** # Everything under apache2/ # - /var/log/apache2/*.log # Log files directly under apache2/ # - /var/lib/lxc/*/rootfs/home/**/*.py # All Python files of all users in # # that container """) class LoggingDateFormatter(logging.Formatter): """Formatter that puts milliseconds in the timestamp. """ converter = datetime.fromtimestamp def formatTime(self, record, datefmt=None): ct = self.converter(record.created) t = ct.strftime("%H:%M:%S") s = "%s.%03d" % (t, record.msecs) return s def setup_logging(tag, verbosity): """Sets up the logging module. """ levels = [logging.CRITICAL, logging.WARNING, logging.INFO, logging.DEBUG] console_level = levels[min(verbosity, 3)] file_level = logging.INFO min_level = min(console_level, file_level) # Create formatter fmt = "[%s] %%(asctime)s %%(levelname)s: %%(message)s" % tag formatter = LoggingDateFormatter(fmt) # Console logger handler = logging.StreamHandler() handler.setLevel(console_level) handler.setFormatter(formatter) # Set up logger rootlogger = logging.root rootlogger.setLevel(min_level) rootlogger.addHandler(handler) # File logger if os.environ.get('REPROZIP_NO_LOGFILE', '').lower() in ('', 'false', '0', 'off'): dotrpz = Path('~/.reprozip').expand_user() try: if not dotrpz.is_dir(): dotrpz.mkdir() filehandler = logging.handlers.RotatingFileHandler( str(dotrpz / 'log'), mode='a', delay=False, maxBytes=400000, backupCount=5) except (IOError, OSError): logger.warning("Couldn't create log file %s", dotrpz / 'log') else: filehandler.setFormatter(formatter) filehandler.setLevel(file_level) rootlogger.addHandler(filehandler) filehandler.emit(logging.root.makeRecord( __name__.split('.', 1)[0], logging.INFO, "(log start)", 0, "Log opened %s %s", (datetime.now().strftime("%Y-%m-%d"), sys.argv), None)) logging.getLogger('urllib3').setLevel(logging.INFO) _usage_report = None def setup_usage_report(name, version): """Sets up the usagestats module. """ global _usage_report certificate_file = get_reprozip_ca_certificate() _usage_report = usagestats.Stats( '~/.reprozip/usage_stats', usagestats.Prompt(enable='%s usage_report --enable' % name, disable='%s usage_report --disable' % name), os.environ.get('REPROZIP_USAGE_URL', 'https://stats.reprozip.org/'), version='%s %s' % (name, version), unique_user_id=True, env_var='REPROZIP_USAGE_STATS', ssl_verify=certificate_file.path) try: os.getcwd().encode('ascii') except (UnicodeEncodeError, UnicodeDecodeError): record_usage(cwd_ascii=False) else: record_usage(cwd_ascii=True) def enable_usage_report(enable): """Enables or disables usage reporting. """ if enable: _usage_report.enable_reporting() stderr.write("Thank you, usage reports will be sent automatically " "from now on.\n") else: _usage_report.disable_reporting() stderr.write("Usage reports will not be collected nor sent.\n") def record_usage(**kwargs): """Records some info in the current usage report. """ if _usage_report is not None: _usage_report.note(kwargs) def record_usage_package(runs, packages, other_files, inputs_outputs, pack_id=None): """Records the info on some pack file into the current usage report. """ if _usage_report is None: return for run in runs: record_usage(argv0=run['argv'][0]) record_usage(pack_id=pack_id or '', nb_packages=len(packages), nb_package_files=sum(len(pkg.files) for pkg in packages), packed_packages=sum(1 for pkg in packages if pkg.packfiles), nb_other_files=len(other_files), nb_input_outputs_files=len(inputs_outputs), nb_input_files=sum(1 for f in itervalues(inputs_outputs) if f.read_runs), nb_output_files=sum(1 for f in itervalues(inputs_outputs) if f.write_runs)) def submit_usage_report(**kwargs): """Submits the current usage report to the usagestats server. """ _usage_report.submit(kwargs, usagestats.OPERATING_SYSTEM, usagestats.SESSION_TIME, usagestats.PYTHON_VERSION) def get_reprozip_ca_certificate(): """Gets the ReproZip CA certificate filename. """ fd, certificate_file = Path.tempfile(prefix='rpz_stats_ca_', suffix='.pem') with certificate_file.open('wb') as fp: fp.write(usage_report_ca) os.close(fd) atexit.register(os.remove, certificate_file.path) return certificate_file usage_report_ca = b'''\ -----BEGIN CERTIFICATE----- MIIDzzCCAregAwIBAgIJAMmlcDnTidBEMA0GCSqGSIb3DQEBCwUAMH4xCzAJBgNV BAYTAlVTMREwDwYDVQQIDAhOZXcgWW9yazERMA8GA1UEBwwITmV3IFlvcmsxDDAK BgNVBAoMA05ZVTERMA8GA1UEAwwIUmVwcm9aaXAxKDAmBgkqhkiG9w0BCQEWGXJl cHJvemlwLWRldkB2Z2MucG9seS5lZHUwHhcNMTQxMTA3MDUxOTA5WhcNMjQxMTA0 MDUxOTA5WjB+MQswCQYDVQQGEwJVUzERMA8GA1UECAwITmV3IFlvcmsxETAPBgNV BAcMCE5ldyBZb3JrMQwwCgYDVQQKDANOWVUxETAPBgNVBAMMCFJlcHJvWmlwMSgw JgYJKoZIhvcNAQkBFhlyZXByb3ppcC1kZXZAdmdjLnBvbHkuZWR1MIIBIjANBgkq hkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA1fuTW2snrVji51vGVl9hXAAZbNJ+dxG+ /LOOxZrF2f1RRNy8YWpeCfGbsZqiIEjorBv8lvdd9P+tD3M5sh9L0zQPU9dFvDb+ OOrV0jx59hbK3QcCQju3YFuAtD1lu8TBIPgGEab0eJhLVIX+XU5cYXrfoBmwCpN/ 1wXWkUhN91ZVMA0ylATAxTpnoNuMKzfTxT8pyOWajiTskYkKmVBAxgYJQe1YDFA8 fglBNkQuHqP8jgYAniEBCAPZRMMq8WpOtyFx+L9LX9/WcHtAQyDPPb9M81KKgPQq urtCqtuDKxuqcX9zg4/O8l4nZ50pwaJjbH4kMW/wnLzTPvzZCPtJYQIDAQABo1Aw TjAdBgNVHQ4EFgQUJjhDDOup4P0cdrAVq1F9ap3yTj8wHwYDVR0jBBgwFoAUJjhD DOup4P0cdrAVq1F9ap3yTj8wDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQsFAAOC AQEAeKpTiy2WYPqevHseTCJDIL44zghDJ9w5JmECOhFgPXR9Hl5Nh9S1j4qHBs4G cn8d1p2+8tgcJpNAysjuSl4/MM6hQNecW0QVqvJDQGPn33bruMB4DYRT5du1Zpz1 YIKRjGU7Of3CycOCbaT50VZHhEd5GS2Lvg41ngxtsE8JKnvPuim92dnCutD0beV+ 4TEvoleIi/K4AZWIaekIyqazd0c7eQjgSclNGgePcdbaxIo0u6tmdTYk3RNzo99t DCfXxuMMg3wo5pbqG+MvTdECaLwt14zWU259z8JX0BoeVG32kHlt2eUpm5PCfxqc dYuwZmAXksp0T0cWo0DnjJKRGQ== -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- MIIDRDCCAiygAwIBAgIUXaa8P7qR4c0P51hCDIqj4GUbG/owDQYJKoZIhvcNAQEL BQAwEzERMA8GA1UEAwwIUmVwcm9aaXAwIBcNMjEwNDI5MjEwNTUzWhgPMjEyMTA0 MjkyMTA1NTNaMBMxETAPBgNVBAMMCFJlcHJvWmlwMIIBIjANBgkqhkiG9w0BAQEF AAOCAQ8AMIIBCgKCAQEA3udPriZ8kziQE+OyLVozJFDSZTe8RLlpFsu/ZQjSnIh1 TsENMMu1lwv0GVEpT/EbtD5ORtZzwYQ7Vuh+IO4TQDhA5KvyJD2gZW8hE4txkkQd yI5vSj0iiViA80tKB7FSDLsvz9iiDxShYHJI947gswbaLmampHIXD/Rjjs7+hmL5 RRS5NL8vCp2/2QVj5wnJupa5O2l2T1M6S/SyFcAgBMM/FhDsaA/yf4NPcOG6gFuO b5mYz2ERSf4v9mRL+G3r6YULYGZWS5ThY0QoZ0lYt2nlthzwfftazrJ9+yfYBkoJ K6Ug8UGtyOb5m3mK00c4wS7/wzuGgLMszkE0nE9SfwIDAQABo4GNMIGKMB0GA1Ud DgQWBBSqrIPVnO5vkHj9ImGvOr38r4rcNjBOBgNVHSMERzBFgBSqrIPVnO5vkHj9 ImGvOr38r4rcNqEXpBUwEzERMA8GA1UEAwwIUmVwcm9aaXCCFF2mvD+6keHND+dY QgyKo+BlGxv6MAwGA1UdEwQFMAMBAf8wCwYDVR0PBAQDAgEGMA0GCSqGSIb3DQEB CwUAA4IBAQC2g8yX1c5JutH/qAKUVvqSwP2KBm3IyOjdbN7cvnwn0gMkwEj88j7p dKhfO0Kfp/N4iKj1YK7PBXfrdlYhxINSbfPSVS3A9XWi8pJwiwgBfjrrACRMhBsv HAQPnkXnaEJrQm//k8s4etT25JYaPgXS8t+dgVS0iqonYlpCB1XkE0gw1kLNCW5F 1SimUehJ5bZrJYGgo6kp44F12kPMzNHvk50Sf2p3nm2d9/BNbbJQxUBKt9n+i6Ir xGGDWfg5F+BLKURWkoGBnnLPqkRxlkaGvk6QpIAfD1h99fTyuWUno3+NoQ7hS952 yWbmqwbavTIyG+D+kfhbGFEdRxLF5BeK -----END CERTIFICATE----- ''' ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1701984163.0 reprounzip-1.3/reprounzip/main.py0000664000175000017500000001213114534433643016706 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Entry point for the reprounzip utility. This contains :func:`~reprounzip.reprounzip.main`, which is the entry point declared to setuptools. It is also callable directly. It dispatchs to plugins registered through pkg_resources as entry point ``reprounzip.unpackers``. """ from __future__ import division, print_function, unicode_literals if __name__ == '__main__': # noqa from reprounzip.main import main main() import argparse import locale import logging from pkg_resources import iter_entry_points import sys import traceback from reprounzip.common import setup_logging, \ setup_usage_report, enable_usage_report, \ submit_usage_report, record_usage from reprounzip import signals from reprounzip.unpackers.common import UsageError from reprounzip.utils import stderr __version__ = '1.3' logger = logging.getLogger('reprounzip') unpackers = {} def get_plugins(entry_point_name): for entry_point in iter_entry_points(entry_point_name): try: func = entry_point.load() except Exception: print("Plugin %s from %s %s failed to initialize!" % ( entry_point.name, entry_point.dist.project_name, entry_point.dist.version), file=sys.stderr) traceback.print_exc(file=sys.stderr) continue name = entry_point.name # Docstring is used as description (used for detailed help) descr = func.__doc__.strip() # First line of docstring is the help (used for general help) descr_1 = descr.split('\n', 1)[0] yield name, func, descr, descr_1 class RPUZArgumentParser(argparse.ArgumentParser): def error(self, message): sys.stderr.write('error: %s\n' % message) self.print_help(sys.stderr) sys.exit(2) def usage_report(args): if bool(args.enable) == bool(args.disable): logger.critical("What do you want to do?") raise UsageError enable_usage_report(args.enable) sys.exit(0) def main(): """Entry point when called on the command-line. """ # Locale try: locale.setlocale(locale.LC_ALL, '') except locale.Error as e: stderr.write("Couldn't set locale: %s\n" % e) # Parses command-line # General options def add_options(opts): opts.add_argument('--version', action='version', version="reprounzip version %s" % __version__) # Loads plugins for name, func, descr, descr_1 in get_plugins('reprounzip.plugins'): func() parser = RPUZArgumentParser( description="reprounzip is the ReproZip component responsible for " "unpacking and reproducing an experiment previously " "packed with reprozip", epilog="Please report issues to reprozip@nyu.edu") add_options(parser) parser.add_argument('-v', '--verbose', action='count', default=1, dest='verbosity', help="augments verbosity level") subparsers = parser.add_subparsers(title="subcommands", metavar='') # usage_report subcommand parser_stats = subparsers.add_parser( 'usage_report', help="Enables or disables anonymous usage reports") add_options(parser_stats) parser_stats.add_argument('--enable', action='store_true') parser_stats.add_argument('--disable', action='store_true') parser_stats.set_defaults(func=usage_report) # Loads unpackers for name, func, descr, descr_1 in get_plugins('reprounzip.unpackers'): plugin_parser = subparsers.add_parser( name, help=descr_1, description=descr, formatter_class=argparse.RawDescriptionHelpFormatter) add_options(plugin_parser) info = func(plugin_parser) plugin_parser.set_defaults(selected_unpacker=name) if info is None: info = {} unpackers[name] = info signals.pre_parse_args(parser=parser, subparsers=subparsers) args = parser.parse_args() signals.post_parse_args(args=args) if getattr(args, 'func', None) is None: parser.print_help(sys.stderr) sys.exit(2) signals.unpacker = getattr(args, 'selected_unpacker', None) setup_logging('REPROUNZIP', args.verbosity) setup_usage_report('reprounzip', __version__) if hasattr(args, 'selected_unpacker'): record_usage(unpacker=args.selected_unpacker) signals.pre_setup.subscribe(lambda **kw: record_usage(setup=True)) signals.pre_run.subscribe(lambda **kw: record_usage(run=True)) try: try: args.func(args) except UsageError: raise except Exception as e: signals.application_finishing(reason=e) submit_usage_report(result=type(e).__name__) raise else: signals.application_finishing(reason=None) except UsageError: parser.print_help(sys.stderr) sys.exit(2) submit_usage_report(result='success') sys.exit(0) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684165137.0 reprounzip-1.3/reprounzip/orderedset.py0000664000175000017500000000565314430451021020116 0ustar00remramremram# From http://code.activestate.com/recipes/576694/ # With added update() # Copyright (C) 2009 Raymond Hettinger # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. try: from collections.abc import MutableSet except ImportError: from collections import MutableSet class OrderedSet(MutableSet): def __init__(self, iterable=None): self.end = end = [] end += [None, end, end] # sentinel node for doubly linked list self.map = {} # key --> [key, prev, next_] if iterable is not None: self |= iterable def __len__(self): return len(self.map) def __contains__(self, key): return key in self.map def add(self, key): if key not in self.map: end = self.end curr = end[1] curr[2] = end[1] = self.map[key] = [key, curr, end] def discard(self, key): if key in self.map: key, prev, next_ = self.map.pop(key) prev[2] = next_ next_[1] = prev def __iter__(self): end = self.end curr = end[2] while curr is not end: yield curr[0] curr = curr[2] def __reversed__(self): end = self.end curr = end[1] while curr is not end: yield curr[0] curr = curr[1] def pop(self, last=True): if not self: raise KeyError('set is empty') key = self.end[1][0] if last else self.end[2][0] self.discard(key) return key def __repr__(self): if not self: return '%s()' % (self.__class__.__name__,) return '%s(%r)' % (self.__class__.__name__, list(self)) def __eq__(self, other): if isinstance(other, OrderedSet): return len(self) == len(other) and list(self) == list(other) return set(self) == set(other) def update(self, iterable): for key in iterable: self.add(key) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/pack_info.py0000664000175000017500000003437014505567114017723 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Entry point for the reprounzip utility. This contains :func:`~reprounzip.reprounzip.main`, which is the entry point declared to setuptools. It is also callable directly. It dispatchs to plugins registered through pkg_resources as entry point ``reprounzip.unpackers``. """ from __future__ import division, print_function, unicode_literals import argparse import distro import json import logging import platform from rpaths import Path import sys from reprounzip.common import RPZPack, load_config as load_config_file from reprounzip.main import unpackers from reprounzip.unpackers.common import load_config, COMPAT_OK, COMPAT_MAYBE, \ COMPAT_NO, UsageError, shell_escape, metadata_read from reprounzip.utils import iteritems, itervalues, unicode_, hsize logger = logging.getLogger('reprounzip') def get_package_info(pack, read_data=False): """Get information about a package. """ runs, packages, other_files = config = load_config(pack) inputs_outputs = config.inputs_outputs information = {} if read_data: total_size = 0 total_paths = 0 files = 0 dirs = 0 symlinks = 0 hardlinks = 0 others = 0 rpz_pack = RPZPack(pack) for m in rpz_pack.list_data(): total_size += m.size total_paths += 1 if m.isfile(): files += 1 elif m.isdir(): dirs += 1 elif m.issym(): symlinks += 1 elif hasattr(m, 'islnk') and m.islnk(): hardlinks += 1 else: others += 1 rpz_pack.close() information['pack'] = { 'total_size': total_size, 'total_paths': total_paths, 'files': files, 'dirs': dirs, 'symlinks': symlinks, 'hardlinks': hardlinks, 'others': others, } total_paths = 0 packed_packages_files = 0 unpacked_packages_files = 0 packed_packages = 0 for package in packages: nb = len(package.files) total_paths += nb if package.packfiles: packed_packages_files += nb packed_packages += 1 else: unpacked_packages_files += nb nb = len(other_files) total_paths += nb information['meta'] = { 'total_paths': total_paths, 'packed_packages_files': packed_packages_files, 'unpacked_packages_files': unpacked_packages_files, 'packages': len(packages), 'packed_packages': packed_packages, 'packed_paths': packed_packages_files + nb, } if runs: architecture = runs[0]['architecture'] if any(r['architecture'] != architecture for r in runs): logger.warning("Runs have different architectures") information['meta']['architecture'] = architecture distribution = runs[0]['distribution'] if any(r['distribution'] != distribution for r in runs): logger.warning("Runs have different distributions") information['meta']['distribution'] = distribution information['runs'] = [ dict((k, run[k]) for k in ['id', 'binary', 'argv', 'environ', 'workingdir', 'signal', 'exitcode'] if k in run) for run in runs] information['inputs_outputs'] = { name: {'path': str(iofile.path), 'read_runs': iofile.read_runs, 'write_runs': iofile.write_runs} for name, iofile in iteritems(inputs_outputs)} rpz_pack = RPZPack(pack) information['extensions'] = sorted(rpz_pack.extensions()) # Unpacker compatibility unpacker_status = {} for name, upk in iteritems(unpackers): if 'test_compatibility' in upk: compat = upk['test_compatibility'] if callable(compat): compat = compat(pack, config=config) if isinstance(compat, (tuple, list)): compat, msg = compat else: msg = None unpacker_status.setdefault(compat, []).append((name, msg)) else: unpacker_status.setdefault(None, []).append((name, None)) information['unpacker_status'] = unpacker_status return information def _print_package_info(pack, info, verbosity=1): print("Pack file: %s" % pack) print("\n----- Pack information -----") print("Compressed size: %s" % hsize(pack.size())) info_pack = info.get('pack') if info_pack: if 'total_size' in info_pack: print("Unpacked size: %s" % hsize(info_pack['total_size'])) if 'total_paths' in info_pack: print("Total packed paths: %d" % info_pack['total_paths']) if verbosity >= 3: print(" Files: %d" % info_pack['files']) print(" Directories: %d" % info_pack['dirs']) if info_pack.get('symlinks'): print(" Symbolic links: %d" % info_pack['symlinks']) if info_pack.get('hardlinks'): print(" Hard links: %d" % info_pack['hardlinks']) if info_pack.get('others'): print(" Unknown (what!?): %d" % info_pack['others']) print("\n----- Metadata -----") info_meta = info['meta'] if verbosity >= 3: print("Total paths: %d" % info_meta['total_paths']) print("Listed packed paths: %d" % info_meta['packed_paths']) if info_meta.get('packages'): print("Total software packages: %d" % info_meta['packages']) print("Packed software packages: %d" % info_meta['packed_packages']) if verbosity >= 3: print("Files from packed software packages: %d" % info_meta['packed_packages_files']) print("Files from unpacked software packages: %d" % info_meta['unpacked_packages_files']) if 'architecture' in info_meta: print("Architecture: %s (current: %s)" % (info_meta['architecture'], platform.machine().lower())) if 'distribution' in info_meta: distribution = ' '.join(t for t in info_meta['distribution'] if t) current_distribution = [distro.id(), distro.version()] current_distribution = ' '.join(t for t in current_distribution if t) print("Distribution: %s (current: %s)" % ( distribution, current_distribution or "(not Linux)")) if 'runs' in info: runs = info['runs'] print("Runs (%d):" % len(runs)) for run in runs: cmdline = ' '.join(shell_escape(a) for a in run['argv']) if len(runs) == 1 and run['id'] == "run0": print(" %s" % cmdline) else: print(" %s: %s" % (run['id'], cmdline)) if verbosity >= 2: print(" wd: %s" % run['workingdir']) if 'signal' in run: print(" signal: %d" % run['signal']) else: print(" exitcode: %d" % run['exitcode']) inputs_outputs = info.get('inputs_outputs') if inputs_outputs: if verbosity < 2: print("Inputs/outputs files (%d): %s" % ( len(inputs_outputs), ", ".join(sorted(inputs_outputs)))) else: print("Inputs/outputs files (%d):" % len(inputs_outputs)) for name, f in sorted(iteritems(inputs_outputs)): t = [] if f['read_runs']: t.append("in") if f['write_runs']: t.append("out") print(" %s (%s): %s" % (name, ' '.join(t), f['path'])) extensions = info.get('extensions') if extensions: print("\n----- Extensions -----") for name in extensions: print(name) unpacker_status = info.get('unpacker_status') if unpacker_status: print("\n----- Unpackers -----") for s, n in [(COMPAT_OK, "Compatible"), (COMPAT_MAYBE, "Unknown"), (COMPAT_NO, "Incompatible")]: if s != COMPAT_OK and verbosity < 2: continue if s not in unpacker_status: continue upks = unpacker_status[s] print("%s (%d):" % (n, len(upks))) for upk_name, msg in upks: if msg is not None: print(" %s (%s)" % (upk_name, msg)) else: print(" %s" % upk_name) def print_info(args): """Writes out some information about a pack file. """ pack = Path(args.pack[0]) info = get_package_info(pack, read_data=args.json or args.verbosity >= 2) if args.json: json.dump(info, sys.stdout, indent=2) sys.stdout.write('\n') else: _print_package_info(pack, info, args.verbosity) def showfiles(args): """Writes out the input and output files. Works both for a pack file and for an extracted directory. """ def parse_run(runs, s): for i, run in enumerate(runs): if run['id'] == s: return i try: r = int(s) except ValueError: logger.critical("Error: Unknown run %s", s) raise UsageError if r < 0 or r >= len(runs): logger.critical("Error: Expected 0 <= run <= %d, got %d", len(runs) - 1, r) sys.exit(1) return r show_inputs = args.input or not args.output show_outputs = args.output or not args.input def file_filter(fio): if file_filter.run is None: return ((show_inputs and fio.read_runs) or (show_outputs and fio.write_runs)) else: return ((show_inputs and file_filter.run in fio.read_runs) or (show_outputs and file_filter.run in fio.write_runs)) file_filter.run = None pack = Path(args.pack[0]) if not pack.exists(): logger.critical("Pack or directory %s does not exist", pack) sys.exit(1) if pack.is_dir(): # Reads info from an unpacked directory config = load_config_file(pack / 'config.yml', canonical=True) # Filter files by run if args.run is not None: file_filter.run = parse_run(config.runs, args.run) # The '.reprounzip' file is a pickled dictionary, it contains the name # of the files that replaced each input file (if upload was used) unpacked_info = metadata_read(pack, None) assigned_input_files = unpacked_info.get('input_files', {}) if show_inputs: shown = False for input_name, f in sorted(iteritems(config.inputs_outputs)): if f.read_runs and file_filter(f): if not shown: print("Input files:") shown = True if args.verbosity >= 2: print(" %s (%s)" % (input_name, f.path)) else: print(" %s" % input_name) assigned = assigned_input_files.get(input_name) if assigned is None: assigned = "(original)" elif assigned is False: assigned = "(not created)" elif assigned is True: assigned = "(generated)" else: assert isinstance(assigned, (bytes, unicode_)) print(" %s" % assigned) if not shown: print("Input files: none") if show_outputs: shown = False for output_name, f in sorted(iteritems(config.inputs_outputs)): if f.write_runs and file_filter(f): if not shown: print("Output files:") shown = True if args.verbosity >= 2: print(" %s (%s)" % (output_name, f.path)) else: print(" %s" % output_name) if not shown: print("Output files: none") else: # pack.is_file() # Reads info from a pack file config = load_config(pack) # Filter files by run if args.run is not None: file_filter.run = parse_run(config.runs, args.run) if any(f.read_runs for f in itervalues(config.inputs_outputs)): print("Input files:") for input_name, f in sorted(iteritems(config.inputs_outputs)): if f.read_runs and file_filter(f): if args.verbosity >= 2: print(" %s (%s)" % (input_name, f.path)) else: print(" %s" % input_name) else: print("Input files: none") if any(f.write_runs for f in itervalues(config.inputs_outputs)): print("Output files:") for output_name, f in sorted(iteritems(config.inputs_outputs)): if f.write_runs and file_filter(f): if args.verbosity >= 2: print(" %s (%s)" % (output_name, f.path)) else: print(" %s" % output_name) else: print("Output files: none") def setup_info(parser, **kwargs): """Prints out some information about a pack """ parser.add_argument('pack', nargs=1, help="Pack to read") parser.add_argument('--json', action='store_true', default=False) parser.set_defaults(func=print_info) def setup_showfiles(parser, **kwargs): """Prints out input and output file names """ parser.add_argument('pack', nargs=1, help="Pack or directory to read from") parser.add_argument('run', nargs=argparse.OPTIONAL, help="Run whose input and output files will be listed") parser.add_argument('--input', action='store_true', help="Only show input files") parser.add_argument('--output', action='store_true', help="Only show output files") parser.set_defaults(func=showfiles) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/parameters.py0000664000175000017500000007112514505567114020134 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Retrieve parameters from online source. Most unpackers require some parameters that are likely to change on a different schedule from ReproZip's releases. To account for that, ReproZip downloads a "parameter file", which is just a JSON with a bunch of parameters. In there you will find things like the address of some binaries that are downloaded from the web (rpzsudo and busybox), and the name of Vagrant boxes and Docker images for various operating systems. """ from __future__ import division, print_function, unicode_literals from distutils.version import LooseVersion import json import logging import os from reprounzip.common import get_reprozip_ca_certificate from reprounzip.utils import download_file logger = logging.getLogger('reprounzip') parameters = None def update_parameters(): """Try to download a new version of the parameter file. """ global parameters if parameters is not None: return url = 'https://stats.reprozip.org/parameters/' env_var = os.environ.get('REPROZIP_PARAMETERS') if env_var and ( env_var.startswith('http://') or env_var.startswith('https://')): # This is only used for testing # Note that this still expects the ReproZip CA url = env_var elif env_var not in (None, '', '1', 'on', 'enabled', 'yes', 'true'): parameters = _bundled_parameters return try: from reprounzip.main import __version__ as version filename = download_file( '%s%s' % (url, version), None, cachename='parameters.json', ssl_verify=str(get_reprozip_ca_certificate())) except Exception: logger.warning("Can't download parameters.json, using bundled " "parameters") else: try: with filename.open() as fp: parameters = json.load(fp) except ValueError: logger.info("Downloaded parameters.json doesn't load, using " "bundled parameters") try: filename.remove() except OSError: pass else: ver = LooseVersion(parameters.get('version', '1.0')) if LooseVersion('1.0') <= ver < LooseVersion('1.1'): return else: logger.info("parameters.json has incompatible version %s, " "using bundled parameters", ver) parameters = _bundled_parameters def get_parameter(section): """Get a parameter from the downloaded or default parameter file. """ if parameters is None: update_parameters() return parameters.get(section, None) _bundled_parameters = { "busybox_url": { "x86_64": "https://s3.amazonaws.com/reprozip-files/busybox-x86_64", "i686": "https://s3.amazonaws.com/reprozip-files/busybox-i686" }, "rpzsudo_url": { "x86_64": "https://github.com/remram44/static-sudo/releases/download/" "current/rpzsudo-x86_64", "i686": "https://github.com/remram44/static-sudo/releases/download/" "current/rpzsudo-i686" }, "rpztar_url": { "x86_64": "https://github.com/remram44/rpztar/releases/download/" "v1/rpztar-x86_64", "i686": "https://github.com/remram44/rpztar/releases/download/" "v1/rpztar-i686" }, "docker_images": { "default": "debian", "images": { "ubuntu": { "versions": [ { "version": "^12\\.04$", "distribution": "ubuntu", "image": "ubuntu:12.04", "name": "Ubuntu 12.04 'Precise'" }, { "version": "^14\\.04$", "distribution": "ubuntu", "image": "ubuntu:14.04", "name": "Ubuntu 14.04 'Trusty'" }, { "version": "^14\\.10$", "distribution": "ubuntu", "image": "ubuntu:14.10", "name": "Ubuntu 14.10 'Utopic'" }, { "version": "^15\\.04$", "distribution": "ubuntu", "image": "ubuntu:15.04", "name": "Ubuntu 15.04 'Vivid'" }, { "version": "^15\\.10$", "distribution": "ubuntu", "image": "ubuntu:15.10", "name": "Ubuntu 15.10 'Wily'" }, { "version": "^16\\.04$", "distribution": "ubuntu", "image": "ubuntu:16.04", "name": "Ubuntu 16.04 'Xenial'" }, { "version": "^16\\.10$", "distribution": "ubuntu", "image": "ubuntu:16.10", "name": "Ubuntu 16.10 'Yakkety'" }, { "version": "^17\\.04$", "distribution": "ubuntu", "image": "ubuntu:17.04", "name": "Ubuntu 17.04 'Zesty'" }, { "version": "^17\\.10$", "distribution": "ubuntu", "image": "ubuntu:17.10", "name": "Ubuntu 17.10 'Artful'" }, { "version": "^18\\.04$", "distribution": "ubuntu", "image": "ubuntu:18.04", "name": "Ubuntu 18.04 'Bionic'" }, { "version": "^18\\.10$", "distribution": "ubuntu", "image": "ubuntu:18.10", "name": "Ubuntu 18.10 'Cosmic'" }, { "version": "^19\\.04$", "distribution": "ubuntu", "image": "ubuntu:19.04", "name": "Ubuntu 19.04 'Disco'" } ], "default": { "distribution": "ubuntu", "image": "ubuntu:19.04", "name": "Ubuntu 19.04 'Disco'" } }, "debian": { "versions": [ { "version": "^(6(\\.|$))|(squeeze)", "distribution": "debian", "image": "debian:squeeze", "name": "Debian 6 'Squeeze'" }, { "version": "^(7(\\.|$))|(wheezy)", "distribution": "debian", "image": "debian:wheezy", "name": "Debian 7 'Wheezy'" }, { "version": "^(8(\\.|$))|(jessie)", "distribution": "debian", "image": "debian:jessie", "name": "Debian 8 'Jessie'" }, { "version": "^(9(\\.|$))|(stretch)", "distribution": "debian", "image": "debian:stretch", "name": "Debian 9 'Stretch'" }, { "version": "^(10(\\.|$))|(buster)", "distribution": "debian", "image": "debian:buster", "name": "Debian 10 'Buster'" } ], "default": { "distribution": "debian", "image": "debian:stretch", "name": "Debian 9 'Stretch'" } }, "centos": { "versions": [ { "version": "^5(\\.|$)", "distribution": "centos", "image": "centos:centos5", "name": "CentOS 5" }, { "version": "^6(\\.|$)", "distribution": "centos", "image": "centos:centos6", "name": "CentOS 6" }, { "version": "^7(\\.|$)", "distribution": "centos", "image": "centos:centos7", "name": "CentOS 7" } ], "default": { "distribution": "centos", "image": "centos:centos7", "name": "CentOS 7" } }, "centos linux": { "versions": [ { "version": "^5(\\.|$)", "distribution": "centos", "image": "centos:centos5", "name": "CentOS 5" }, { "version": "^6(\\.|$)", "distribution": "centos", "image": "centos:centos6", "name": "CentOS 6" }, { "version": "^7(\\.|$)", "distribution": "centos", "image": "centos:centos7", "name": "CentOS 7" } ], "default": { "distribution": "centos", "image": "centos:centos7", "name": "CentOS 7" } }, "fedora": { "versions": [ { "version": "^20$", "distribution": "fedora", "image": "fedora:20", "name": "Fedora 20" }, # Fedora 21-24 omitted because they don't include tar { "version": "^25$", "distribution": "fedora", "image": "fedora:25", "name": "Fedora 25" }, { "version": "^26$", "distribution": "fedora", "image": "fedora:26", "name": "Fedora 26" }, { "version": "^27$", "distribution": "fedora", "image": "fedora:27", "name": "Fedora 27" }, { "version": "^28$", "distribution": "fedora", "image": "fedora:28", "name": "Fedora 28" }, { "version": "^29$", "distribution": "fedora", "image": "fedora:29", "name": "Fedora 29" } ], "default": { "distribution": "fedora", "image": "fedora:29", "name": "Fedora 29" } } } }, "vagrant_boxes": { "default": "debian", "boxes": { "ubuntu": { "versions": [ { "version": "^12\\.04$", "distribution": "ubuntu", "architectures": { "i686": "hashicorp/precise32", "x86_64": "hashicorp/precise64" }, "name": "Ubuntu 12.04 'Precise'" }, { "version": "^14\\.04$", "distribution": "ubuntu", "architectures": { "i686": "ubuntu/trusty32", "x86_64": "ubuntu/trusty64" }, "name": "Ubuntu 14.04 'Trusty'" }, { "version": "^15\\.04$", "distribution": "ubuntu", "architectures": { "i686": "bento/ubuntu-15.04-i386", "x86_64": "bento/ubuntu-15.04" }, "name": "Ubuntu 15.04 'Vivid'" }, { "version": "^15\\.10$", "distribution": "ubuntu", "architectures": { "i686": "bento/ubuntu-15.10-i386", "x86_64": "bento/ubuntu-15.10" }, "name": "Ubuntu 15.10 'Wily'" }, { "version": "^16\\.04$", "distribution": "ubuntu", "architectures": { "i686": "bento/ubuntu-16.04-i386", "x86_64": "bento/ubuntu-16.04" }, "name": "Ubuntu 16.04 'Xenial'" }, { "version": "^16\\.10$", "distribution": "ubuntu", "architectures": { "i686": "bento/ubuntu-16.10-i386", "x86_64": "bento/ubuntu-16.10" }, "name": "Ubuntu 16.10 'Yakkety'" }, { "version": "^17\\.04$", "distribution": "ubuntu", "architectures": { "i686": "bento/ubuntu-17.04-i386", "x86_64": "bento/ubuntu-17.04" }, "name": "Ubuntu 17.04 'Zesty'" }, { "version": "^17\\.10$", "distribution": "ubuntu", "architectures": { "i686": "bento/ubuntu-17.10-i386", "x86_64": "bento/ubuntu-17.10" }, "name": "Ubuntu 17.10 'Artful'" }, { "version": "^18\\.04$", "distribution": "ubuntu", "architectures": { "x86_64": "bento/ubuntu-18.04" }, "name": "Ubuntu 18.04 'Bionic'" }, { "version": "^18\\.10$", "distribution": "ubuntu", "architectures": { "x86_64": "bento/ubuntu-18.10" }, "name": "Ubuntu 18.10 'Cosmic'" }, { "version": "^19\\.04$", "distribution": "ubuntu", "architectures": { "x86_64": "bento/ubuntu-19.04" }, "name": "Ubuntu 19.04 'Disco'" }, { "version": "^19\\.10", "distribution": "ubuntu", "architectures": { "x86_64": "bento/ubuntu-19.10" }, "name": "Ubuntu 19.10 'Eoan'" }, { "version": "^20\\.04$", "distribution": "ubuntu", "architectures": { "x86_64": "bento/ubuntu-20.04" }, "name": "Ubuntu 20.04 'Focal'" }, { "version": "^20\\.10", "distribution": "ubuntu", "architectures": { "x86_64": "bento/ubuntu-20.10" }, "name": "Ubuntu 20.10 'Groovy'" } ], "default": { "distribution": "ubuntu", "architectures": { "i686": "bento/ubuntu-17.04-i386", "x86_64": "bento/ubuntu-20.04" }, "name": "Ubuntu" } }, "debian": { "versions": [ { "version": "^(7(\\.|$))|(wheezy)", "distribution": "debian", "architectures": { "i686": "bento/debian-7.11-i386", "x86_64": "bento/debian-7" }, "name": "Debian 7 'Wheezy'" }, { "version": "^(8(\\.|$))|(jessie)", "distribution": "debian", "architectures": { "i686": "remram/debian-8-i386", "x86_64": "remram/debian-8-amd64" }, "name": "Debian 8 'Jessie'" }, { "version": "^(9(\\.|$))|(stretch)", "distribution": "debian", "architectures": { "i686": "remram/debian-9-i386", "x86_64": "remram/debian-9-amd64" }, "name": "Debian 9 'Stretch'" }, { "version": "^(10(\\.|$))|(buster)", "distribution": "debian", "architectures": { "i686": "remram/debian-10-i386", "x86_64": "bento/debian-10" }, "name": "Debian 10 'Buster'" } ], "default": { "distribution": "debian", "architectures": { "i686": "remram/debian-10-i386", "x86_64": "bento/debian-10" }, "name": "Debian 10 'Buster'" } }, "centos": { "versions": [ { "version": "^5\\.", "distribution": "centos", "architectures": { "i686": "bento/centos-5.11-i386", "x86_64": "bento/centos-5.11" }, "name": "CentOS 5.11" }, { "version": "^6\\.", "distribution": "centos", "architectures": { "i686": "bento/centos-6.10-i386", "x86_64": "bento/centos-6.10" }, "name": "CentOS 6.10" }, { "version": "^7\\.", "distribution": "centos", "architectures": { "i686": "remram/centos-7-i386", "x86_64": "bento/centos-7.6" }, "name": "CentOS 7.6" }, { "version": "^8\\.", "distribution": "centos", "architectures": { "x86_64": "bento/centos-8" }, "name": "CentOS 8" } ], "default": { "distribution": "centos", "architectures": { "i686": "remram/centos-7-i386", "x86_64": "bento/centos-8" }, "name": "CentOS" } }, "centos linux": { "versions": [ { "version": "^5\\.", "distribution": "centos", "architectures": { "i686": "bento/centos-5.11-i386", "x86_64": "bento/centos-5.11" }, "name": "CentOS 5.11" }, { "version": "^6\\.", "distribution": "centos", "architectures": { "i686": "bento/centos-6.10-i386", "x86_64": "bento/centos-6.10" }, "name": "CentOS 6.10" }, { "version": "^7\\.", "distribution": "centos", "architectures": { "x86_64": "bento/centos-7.6" }, "name": "CentOS 7.6" }, { "version": "^8\\.", "distribution": "centos", "architectures": { "x86_64": "bento/centos-8" }, "name": "CentOS 8" } ], "default": { "distribution": "centos", "architectures": { "i686": "remram/centos-7-i386", "x86_64": "bento/centos-8" }, "name": "CentOS" } }, "fedora": { "versions": [ { "version": "^22$", "distribution": "fedora", "architectures": { "i686": "remram/fedora-22-i386", "x86_64": "remram/fedora-22-amd64" }, "name": "Fedora 22" }, { "version": "^23$", "distribution": "fedora", "architectures": { "i686": "remram/fedora-23-i386", "x86_64": "remram/fedora-23-amd64" }, "name": "Fedora 23" }, { "version": "^24$", "distribution": "fedora", "architectures": { "i686": "remram/fedora-24-i386", "x86_64": "remram/fedora-24-amd64" }, "name": "Fedora 24" }, { "version": "^25$", "distribution": "fedora", "architectures": { "x86_64": "bento/fedora-25" }, "name": "Fedora 25" }, { "version": "^26$", "distribution": "fedora", "architectures": { "x86_64": "bento/fedora-26" }, "name": "Fedora 26" }, { "version": "^27$", "distribution": "fedora", "architectures": { "x86_64": "bento/fedora-27" }, "name": "Fedora 27" }, { "version": "^28$", "distribution": "fedora", "architectures": { "x86_64": "bento/fedora-28" }, "name": "Fedora 28" }, { "version": "^29$", "distribution": "fedora", "architectures": { "x86_64": "bento/fedora-29" }, "name": "Fedora 29" }, { "version": "^30", "distribution": "fedora", "architectures": { "x86_64": "bento/fedora-30" }, "name": "Fedora 30" }, { "version": "^31", "distribution": "fedora", "architectures": { "x86_64": "bento/fedora-31" }, "name": "Fedora 31" }, { "version": "^32", "distribution": "fedora", "architectures": { "x86_64": "bento/fedora-32" }, "name": "Fedora 32" }, { "version": "^33", "distribution": "fedora", "architectures": { "x86_64": "bento/fedora-33" }, "name": "Fedora 33" } ], "default": { "distribution": "fedora", "architectures": { "i686": "remram/fedora-24-i386", "x86_64": "bento/fedora-33" }, "name": "Fedora" } } } }, "vagrant_boxes_x": { "default": "debian", "boxes": { "ubuntu": { "versions": [ { "version": "^16\\.04$", "distribution": "ubuntu", "architectures": { "i686": "remram/ubuntu-1604-amd64-x", "x86_64": "remram/ubuntu-1604-amd64-x" }, "name": "Ubuntu 16.04 'Xenial'" } ], "default": { "distribution": "ubuntu", "architectures": { "i686": "remram/ubuntu-1604-amd64-x", "x86_64": "remram/ubuntu-1604-amd64-x" }, "name": "Ubuntu 16.04 'Xenial'" } }, "debian": { "versions": [ { "version": "^(8(\\.|$))|(jessie)", "distribution": "debian", "architectures": { "i686": "remram/debian-8-amd64-x", "x86_64": "remram/debian-8-amd64-x" }, "name": "Debian 8 'Jessie'" } ], "default": { "distribution": "debian", "architectures": { "i686": "remram/debian-8-amd64-x", "x86_64": "remram/debian-8-amd64-x" }, "name": "Debian 8 'Jessie'" } } } } } ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1701984661.4429889 reprounzip-1.3/reprounzip/plugins/0000775000175000017500000000000014534434625017074 5ustar00remramremram././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/plugins/__init__.py0000664000175000017500000000032014505567114021176 0ustar00remramremramtry: # pragma: no cover __import__('pkg_resources').declare_namespace(__name__) except ImportError: # pragma: no cover from pkgutil import extend_path __path__ = extend_path(__path__, __name__) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/signals.py0000664000175000017500000001067714505567114017436 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Signal system. Emitting and subscribing to these signals is the framework for the plugin infrastructure. """ from __future__ import division, print_function, unicode_literals import traceback import warnings from reprounzip.utils import irange, iteritems class SignalWarning(UserWarning): """Warning from the Signal class. Mainly useful for testing (to turn these to errors), however a 'signal:' prefix is actually used in the messages because of Python bug 22543 http://bugs.python.org/issue22543 """ class Signal(object): """A signal, with its set of arguments. This holds the expected parameters that the signal expects, in several categories: * `expected_args` are the arguments of the signals that must be set. Trying to emit the signal without these will show a warning and won't touch the listeners. Listeners can rely on these being set. * `new_args` are new arguments that listeners cannot yet rely on but that emitters should try to pass in. Missing arguments doesn't show a warning yet but might in the future. * `old_args` are arguments that you might still pass in but that you should move away from; they will show a warning stating their deprecation. Listeners can subscribe to a signal, and may be any callable hashable object. """ REQUIRED, OPTIONAL, DEPRECATED = irange(3) def __init__(self, expected_args=[], new_args=[], old_args=[]): self._args = {} self._args.update((arg, Signal.REQUIRED) for arg in expected_args) self._args.update((arg, Signal.OPTIONAL) for arg in new_args) self._args.update((arg, Signal.DEPRECATED) for arg in old_args) if (len(expected_args) + len(new_args) + len(old_args) != len(self._args)): raise ValueError("Repeated argument names") self._listeners = set() def __call__(self, **kwargs): info = {} for arg, argtype in iteritems(self._args): if argtype == Signal.REQUIRED: try: info[arg] = kwargs.pop(arg) except KeyError: warnings.warn("signal: Missing required argument %s; " "signal ignored" % arg, category=SignalWarning, stacklevel=2) return else: if arg in kwargs: info[arg] = kwargs.pop(arg) if argtype == Signal.DEPRECATED: warnings.warn( "signal: Argument %s is deprecated" % arg, category=SignalWarning, stacklevel=2) if kwargs: arg = next(iter(kwargs)) warnings.warn( "signal: Unexpected argument %s; signal ignored" % arg, category=SignalWarning, stacklevel=2) return for listener in self._listeners: try: listener(**info) except Exception: traceback.print_exc() warnings.warn("signal: Got an exception calling a signal", category=SignalWarning) def subscribe(self, func): """Adds the given callable to the listeners. It must be callable and hashable (it will be put in a set). It will be called with the signals' arguments as keywords. Because new parameters might be introduced, it should accept these by using:: def my_listener(param1, param2, **kwargs_): """ if not callable(func): raise TypeError("%r object is not callable" % type(func)) self._listeners.add(func) def unsubscribe(self, func): """Removes the given callable from the listeners. If the listener wasn't subscribed, does nothing. """ self._listeners.discard(func) pre_setup = Signal(['target', 'pack']) post_setup = Signal(['target'], ['pack']) pre_destroy = Signal(['target']) post_destroy = Signal(['target']) pre_run = Signal(['target']) post_run = Signal(['target', 'retcode']) pre_parse_args = Signal(['parser', 'subparsers']) post_parse_args = Signal(['args']) application_finishing = Signal(['reason']) unpacker = None ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1701984661.4429889 reprounzip-1.3/reprounzip/unpackers/0000775000175000017500000000000014534434625017406 5ustar00remramremram././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/unpackers/__init__.py0000664000175000017500000000032014505567114021510 0ustar00remramremramtry: # pragma: no cover __import__('pkg_resources').declare_namespace(__name__) except ImportError: # pragma: no cover from pkgutil import extend_path __path__ = extend_path(__path__, __name__) ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1701984661.446989 reprounzip-1.3/reprounzip/unpackers/common/0000775000175000017500000000000014534434625020676 5ustar00remramremram././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/unpackers/common/__init__.py0000664000175000017500000000315614505567114023012 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Utility functions for unpacker plugins. This contains functions related to shell scripts, package managers, and the pack files. """ from __future__ import division, print_function, unicode_literals from reprounzip.utils import join_root from reprounzip.unpackers.common.misc import UsageError, \ COMPAT_OK, COMPAT_NO, COMPAT_MAYBE, \ composite_action, target_must_exist, unique_names, \ make_unique_name, shell_escape, load_config, busybox_url, sudo_url, \ rpztar_url, \ FileUploader, FileDownloader, get_runs, add_environment_options, \ fixup_environment, interruptible_call, \ metadata_read, metadata_write, metadata_initial_iofiles, \ metadata_update_run, parse_ports from reprounzip.unpackers.common.packages import THIS_DISTRIBUTION, \ PKG_NOT_INSTALLED, CantFindInstaller, select_installer __all__ = ['THIS_DISTRIBUTION', 'PKG_NOT_INSTALLED', 'select_installer', 'COMPAT_OK', 'COMPAT_NO', 'COMPAT_MAYBE', 'UsageError', 'CantFindInstaller', 'composite_action', 'target_must_exist', 'unique_names', 'make_unique_name', 'shell_escape', 'load_config', 'busybox_url', 'sudo_url', 'rpztar_url', 'join_root', 'FileUploader', 'FileDownloader', 'get_runs', 'add_environment_options', 'fixup_environment', 'interruptible_call', 'metadata_read', 'metadata_write', 'metadata_initial_iofiles', 'metadata_update_run', 'parse_ports'] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/unpackers/common/misc.py0000664000175000017500000005165214505567114022212 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Miscellaneous utilities for unpacker plugins. """ from __future__ import division, print_function, unicode_literals import copy import functools import logging import itertools import os import pickle import random import re import warnings from rpaths import PosixPath, Path import signal import subprocess import sys import tarfile import reprounzip.common from reprounzip.common import RPZPack from reprounzip.parameters import get_parameter from reprounzip.utils import PY3, irange, iteritems, itervalues, \ stdout_bytes, unicode_, join_root, copyfile logger = logging.getLogger('reprounzip') COMPAT_OK = 0 COMPAT_NO = 1 COMPAT_MAYBE = 2 class UsageError(Exception): def __init__(self, msg="Invalid command-line"): Exception.__init__(self, msg) def composite_action(*functions): """Makes an action that just calls several other actions in sequence. Useful to implement ``myplugin setup`` in terms of ``myplugin setup/part1`` and ``myplugin setup/part2``: simply use ``act1n2 = composite_action(act1, act2)``. """ def wrapper(args): for function in functions: function(args) return wrapper def target_must_exist(func): """Decorator that checks that ``args.target`` exists. """ @functools.wraps(func) def wrapper(args): target = Path(args.target[0]) if not target.is_dir(): logger.critical("Error: Target directory doesn't exist") raise UsageError return func(args) return wrapper def unique_names(): """Generates unique sequences of bytes. """ characters = (b"abcdefghijklmnopqrstuvwxyz" b"0123456789") characters = [characters[i:i + 1] for i in irange(len(characters))] rng = random.Random() while True: letters = [rng.choice(characters) for i in irange(10)] yield b''.join(letters) unique_names = unique_names() def make_unique_name(prefix): """Makes a unique (random) bytestring name, starting with the given prefix. """ assert isinstance(prefix, bytes) return prefix + next(unique_names) safe_shell_chars = set("ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz" "0123456789" "-+=/:.,%_") def shell_escape(s): r"""Given bl"a, returns "bl\\"a". """ if isinstance(s, bytes): s = s.decode('utf-8') if not s or any(c not in safe_shell_chars for c in s): return '"%s"' % (s.replace('\\', '\\\\') .replace('"', '\\"') .replace('`', '\\`') .replace('$', '\\$')) else: return s def load_config(pack): """Utility method loading the YAML configuration from inside a pack file. Decompresses the config.yml file from the tarball to a temporary file then loads it. Note that decompressing a single file is inefficient, thus calling this method can be slow. """ rpz_pack = RPZPack(pack) with rpz_pack.with_config() as configfile: return reprounzip.common.load_config(configfile, canonical=True) def busybox_url(arch): """Gets the correct URL for the busybox binary given the architecture. """ return get_parameter('busybox_url')[arch] def sudo_url(arch): """Gets the correct URL for the rpzsudo binary given the architecture. """ return get_parameter('rpzsudo_url')[arch] def rpztar_url(arch): """Gets the correct URL for the rpztar binary given the architecture. """ return get_parameter('rpztar_url')[arch] class FileUploader(object): """Common logic for 'upload' commands. """ data_tgz = 'data.tgz' def __init__(self, target, input_files, files): self.target = target self.input_files = input_files self.run(files) def run(self, files): reprounzip.common.record_usage(upload_files=len(files)) inputs_outputs = self.get_config().inputs_outputs # No argument: list all the input files and exit if not files: print("Input files:") for input_name in sorted(n for n, f in iteritems(inputs_outputs) if f.read_runs): assigned = self.input_files.get(input_name) if assigned is None: assigned = "(original)" elif assigned is False: assigned = "(not created)" elif assigned is True: assigned = "(generated)" else: assert isinstance(assigned, (bytes, unicode_)) print(" %s: %s" % (input_name, assigned)) return self.prepare_upload(files) try: # Upload files for filespec in files: filespec_split = filespec.rsplit(':', 1) if len(filespec_split) != 2: logger.critical("Invalid file specification: %r", filespec) sys.exit(1) local_path, input_name = filespec_split try: input_path = inputs_outputs[input_name].path except KeyError: logger.critical("Invalid input file: %r", input_name) sys.exit(1) temp = None if not local_path: # Restore original file from pack logger.debug("Restoring input file %s", input_path) fd, temp = Path.tempfile(prefix='reprozip_input_') os.close(fd) local_path = self.extract_original_input(input_name, input_path, temp) if local_path is None: temp.remove() logger.warning("No original packed, can't restore " "input file %s", input_name) continue else: local_path = Path(local_path) logger.debug("Uploading file %s to %s", local_path, input_path) if not local_path.exists(): logger.critical("Local file %s doesn't exist", local_path) sys.exit(1) self.upload_file(local_path, input_path) if temp is not None: temp.remove() self.input_files.pop(input_name, None) else: self.input_files[input_name] = local_path.absolute().path finally: self.finalize() def get_config(self): return reprounzip.common.load_config(self.target / 'config.yml', canonical=True) def prepare_upload(self, files): pass def extract_original_input(self, input_name, input_path, temp): tar = tarfile.open(str(self.target / self.data_tgz), 'r:*') try: member = tar.getmember(str(join_root(PosixPath('DATA'), input_path))) except KeyError: return None member = copy.copy(member) member.name = str(temp.components[-1]) tar.extract(member, str(temp.parent)) tar.close() return temp def upload_file(self, local_path, input_path): raise NotImplementedError def finalize(self): pass class FileDownloader(object): """Common logic for 'download' commands. """ def __init__(self, target, files, all_=False): self.target = target self.run(files, all_) def run(self, files, all_): reprounzip.common.record_usage(download_files=len(files)) inputs_outputs = self.get_config().inputs_outputs # No argument: list all the output files and exit if not (all_ or files): print("Output files:") for output_name in sorted(n for n, f in iteritems(inputs_outputs) if f.write_runs): print(" %s" % output_name) return # Parse the name[:path] syntax resolved_files = [] all_files = set(n for n, f in iteritems(inputs_outputs) if f.write_runs) for filespec in files: filespec_split = filespec.split(':', 1) if len(filespec_split) == 1: output_name = local_path = filespec elif len(filespec_split) == 2: output_name, local_path = filespec_split else: logger.critical("Invalid file specification: %r", filespec) sys.exit(1) local_path = Path(local_path) if local_path else None all_files.discard(output_name) resolved_files.append((output_name, local_path)) # If all_ is set, add all the files that weren't explicitely named if all_: for output_name in all_files: resolved_files.append((output_name, Path(output_name))) self.prepare_download(resolved_files) success = True try: # Download files for output_name, local_path in resolved_files: try: remote_path = inputs_outputs[output_name].path except KeyError: logger.critical("Invalid output file: %r", output_name) sys.exit(1) logger.debug("Downloading file %s", remote_path) if local_path is None: ret = self.download_and_print(remote_path) else: ret = self.download(remote_path, local_path) if ret is None: ret = True warnings.warn("download() returned None instead of " "True/False, assuming True", category=DeprecationWarning) if not ret: success = False if not success: sys.exit(1) finally: self.finalize() def get_config(self): return reprounzip.common.load_config(self.target / 'config.yml', canonical=True) def prepare_download(self, files): pass def download_and_print(self, remote_path): # Download to temporary file fd, temp = Path.tempfile(prefix='reprozip_output_') os.close(fd) download_status = self.download(remote_path, temp) if download_status is not None and not download_status: return False # Output to stdout with temp.open('rb') as fp: copyfile(fp, stdout_bytes) temp.remove() return True def download(self, remote_path, local_path): raise NotImplementedError def finalize(self): pass def get_runs(runs, selected_runs, cmdline): """Selects which run(s) to execute based on parts of the command-line. Will return an iterable of run numbers. Might also fail loudly or exit after printing the original command-line. """ name_map = dict((r['id'], i) for i, r in enumerate(runs) if 'id' in r) run_list = [] def parse_run(s): try: r = int(s) except ValueError: logger.critical("Error: Unknown run %s", s) raise UsageError if r < 0 or r >= len(runs): logger.critical("Error: Expected 0 <= run <= %d, got %d", len(runs) - 1, r) sys.exit(1) return r if selected_runs is None: run_list = list(irange(len(runs))) else: for run_item in selected_runs.split(','): run_item = run_item.strip() if run_item in name_map: run_list.append(name_map[run_item]) continue sep = run_item.find('-') if sep == -1: run_list.append(parse_run(run_item)) else: if sep > 0: first = parse_run(run_item[:sep]) else: first = 0 if sep + 1 < len(run_item): last = parse_run(run_item[sep + 1:]) else: last = len(runs) - 1 if last < first: logger.critical("Error: Last run number should be " "greater than the first") sys.exit(1) run_list.extend(irange(first, last + 1)) # --cmdline without arguments: display the original command-line if cmdline == []: print("Original command-lines:") for run in run_list: print(' '.join(shell_escape(arg) for arg in runs[run]['argv'])) sys.exit(0) return run_list def add_environment_options(parser): parser.add_argument('--pass-env', action='append', default=[], help="Environment variable to pass through from the " "host (value from the original machine will be " "overridden; can be passed multiple times)") parser.add_argument('--set-env', action='append', default=[], help="Environment variable to set (value from the " "original machine will be ignored; can be passed " "multiple times)") def fixup_environment(environ, args): if not (args.pass_env or args.set_env): return environ environ = dict(environ) regexes = [re.compile(pattern + '$') for pattern in args.pass_env] for var in os.environ: if any(regex.match(var) for regex in regexes): environ[var] = os.environ[var] for var in args.set_env: if '=' in var: var, value = var.split('=', 1) environ[var] = value else: environ.pop(var, None) return environ if PY3: def pty_spawn(*args, **kwargs): import pty return pty.spawn(*args, **kwargs) else: def pty_spawn(argv): """Version of pty.spawn() for PY2, that returns the exit code. This works around https://bugs.python.org/issue2489. """ logger.info("Using builtin pty.spawn()") import pty import tty if isinstance(argv, bytes): argv = (argv,) pid, master_fd = pty.fork() if pid == pty.CHILD: os.execlp(argv[0], *argv) try: mode = tty.tcgetattr(pty.STDIN_FILENO) tty.setraw(pty.STDIN_FILENO) restore = 1 except tty.error: # This is the same as termios.error restore = 0 try: pty._copy(master_fd, pty._read, pty._read) except (IOError, OSError): if restore: tty.tcsetattr(pty.STDIN_FILENO, tty.TCSAFLUSH, mode) os.close(master_fd) return os.waitpid(pid, 0)[1] def interruptible_call(cmd, **kwargs): assert signal.getsignal(signal.SIGINT) == signal.default_int_handler proc = [None] def _sigint_handler(signum, frame): if proc[0] is not None: try: proc[0].send_signal(signum) except OSError: pass signal.signal(signal.SIGINT, _sigint_handler) try: if kwargs.pop('request_tty', False): try: import pty # noqa: F401 except ImportError: pass else: if hasattr(sys.stdin, 'isatty') and not sys.stdin.isatty(): logger.info("We need a tty and we are not attached to " "one. Opening pty...") if kwargs.pop('shell', False): if not isinstance(cmd, (str, unicode_)): raise TypeError("shell=True but cmd is not a " "string") cmd = ['/bin/sh', '-c', cmd] res = pty_spawn(cmd) return res >> 8 - (res & 0xFF) proc[0] = subprocess.Popen(cmd, **kwargs) return proc[0].wait() finally: signal.signal(signal.SIGINT, signal.default_int_handler) def metadata_read(path, type_): """Read the unpacker-specific metadata from an unpacked directory. :param path: The unpacked directory; `.reprounzip` will be appended to get the name of the pickle file. :param type_: The name of the unpacker, to check for consistency. Unpackers need to store some specific information, along with the status of the input files. This is done in a consistent way so that showfiles can access it (and because duplicating code is not necessary here). It's a simple pickled dictionary under path / '.reprounzip'. The 'input_files' key stores the status of the input files. If you change it, don't forget to call `metadata_write` to write it to disk again. """ filename = path / '.reprounzip' if not filename.exists(): logger.critical("Required metadata missing, did you point this " "command at the directory you created using the " "'setup' command?") raise UsageError with filename.open('rb') as fp: dct = pickle.load(fp) if type_ is not None and dct['unpacker'] != type_: logger.critical("Wrong unpacker used: %s != %s", dct['unpacker'], type_) raise UsageError return dct def metadata_write(path, dct, type_): """Write the unpacker-specific metadata in an unpacked directory. :param path: The unpacked directory; `.reprounzip` will be appended to get the name of the pickle file. :param type_: The name of the unpacker, that is written to the pickle file under the key 'unpacker'. :param dct: The dictionary with the info to write to the file. """ filename = path / '.reprounzip' to_write = {'unpacker': type_} to_write.update(dct) with filename.open('wb') as fp: pickle.dump(to_write, fp, 2) def metadata_initial_iofiles(config, dct=None): """Add the initial state of the {in/out}put files to the unpacker metadata. :param config: The configuration as returned by `load_config()`, which will be used to list the input and output files and to determine which ones have been packed (and therefore exist initially). The `input_files` key contains a dict mapping the name to either: * None (or inexistent): original file and exists * False: doesn't exist (wasn't packed) * True: has been generated by one of the run since the experiment was unpacked * basestring: the user uploaded a file with this path, and no run has overwritten it yet """ if dct is None: dct = {} path2iofile = {f.path: n for n, f in iteritems(config.inputs_outputs)} def packed_files(): yield config.other_files for pkg in config.packages: if pkg.packfiles: yield pkg.files for f in itertools.chain.from_iterable(packed_files()): f = f.path path2iofile.pop(f, None) dct['input_files'] = dict((n, False) for n in itervalues(path2iofile)) return dct def metadata_update_run(config, dct, runs): """Update the unpacker metadata after some runs have executed. :param runs: An iterable of run numbers that were probably executed. This maintains a crude idea of the status of input and output files by updating the files that are outputs of the runs that were just executed. This means that files that were uploaded by the user will no longer be shown as uploaded (they have been overwritten by the experiment) and files that weren't packed exist from now on. This is not very reliable because a run might have created a file that is not designated as its output anyway, or might have failed and thus not created the output (or a bad output). """ runs = set(runs) input_files = dct.setdefault('input_files', {}) for name, fi in iteritems(config.inputs_outputs): if any(r in runs for r in fi.write_runs): input_files[name] = True _port_re = re.compile('^(?:([0-9]+):)?([0-9]+)(?:/([a-z]+))?$') def parse_ports(specifications): ports = [] for port in specifications: m = _port_re.match(port) if m is None: logger.critical("Invalid port specification: '%s'", port) sys.exit(1) host, experiment, proto = m.groups() if not host: host = experiment if not proto: proto = 'tcp' ports.append((int(host), int(experiment), proto)) return ports ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/unpackers/common/packages.py0000664000175000017500000001452014505567114023026 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Utility functions dealing with package managers. """ from __future__ import division, print_function, unicode_literals import distro import logging import subprocess from reprounzip.unpackers.common.misc import UsageError from reprounzip.utils import itervalues logger = logging.getLogger('reprounzip') THIS_DISTRIBUTION = distro.id() PKG_NOT_INSTALLED = "(not installed)" class CantFindInstaller(UsageError): def __init__(self, msg="Can't select a package installer"): UsageError.__init__(self, msg) class AptInstaller(object): """Installer for deb-based systems (Debian, Ubuntu). """ def __init__(self, binary): self.bin = binary def install(self, packages, assume_yes=False): # Installs options = [] if assume_yes: options.append('-y') required_pkgs = set(pkg.name for pkg in packages) r = subprocess.call([self.bin, 'install'] + options + list(required_pkgs)) # Checks on packages pkgs_status = self.get_packages_info(packages) for pkg, status in itervalues(pkgs_status): if status is not None: required_pkgs.discard(pkg.name) if required_pkgs: logger.error("Error: some packages could not be installed:%s", ''.join("\n %s" % pkg for pkg in required_pkgs)) return r, pkgs_status @staticmethod def get_packages_info(packages): if not packages: return {} p = subprocess.Popen(['dpkg-query', '--showformat=${Package;-50}\t${Version}\n', '-W'] + [pkg.name for pkg in packages], stdout=subprocess.PIPE) # name -> (pkg, installed_version) pkgs_dict = dict((pkg.name, (pkg, PKG_NOT_INSTALLED)) for pkg in packages) try: for line in p.stdout: fields = line.split() if len(fields) == 2: name = fields[0].decode('ascii') status = fields[1].decode('ascii') pkg, _ = pkgs_dict[name] pkgs_dict[name] = pkg, status finally: p.wait() return pkgs_dict def update_script(self): return '%s update' % self.bin def install_script(self, packages): return '%s install -y %s' % (self.bin, ' '.join(pkg.name for pkg in packages)) class YumInstaller(object): """Installer for systems using RPM and Yum (Fedora, CentOS, Red-Hat). """ @classmethod def install(cls, packages, assume_yes=False): options = [] if assume_yes: options.append('-y') required_pkgs = set(pkg.name for pkg in packages) r = subprocess.call(['yum', 'install'] + options + list(required_pkgs)) # Checks on packages pkgs_status = cls.get_packages_info(packages) for pkg, status in itervalues(pkgs_status): if status is not None: required_pkgs.discard(pkg.name) if required_pkgs: logger.error("Error: some packages could not be installed:%s", ''.join("\n %s" % pkg for pkg in required_pkgs)) return r, pkgs_status @staticmethod def get_packages_info(packages): if not packages: return {} p = subprocess.Popen(['rpm', '-q'] + [pkg.name for pkg in packages] + ['--qf', '+%{NAME} %{VERSION}-%{RELEASE}\\n'], stdout=subprocess.PIPE) # name -> {pkg, installed_version} pkgs_dict = dict((pkg.name, (pkg, PKG_NOT_INSTALLED)) for pkg in packages) try: for line in p.stdout: if line[0] == b'+': fields = line[1:].split() if len(fields) == 2: name = fields[0].decode('ascii') status = fields[1].decode('ascii') pkg, _ = pkgs_dict[name] pkgs_dict[name] = pkg, status finally: p.wait() return pkgs_dict @staticmethod def update_script(): return '' @staticmethod def install_script(packages): return 'yum install -y %s' % ' '.join(pkg.name for pkg in packages) def select_installer(pack, runs, target_distribution=THIS_DISTRIBUTION, check_distrib_compat=True): """Selects the right package installer for a Linux distribution. """ orig_distribution = runs[0]['distribution'][0].lower() # Checks that the distributions match if not check_distrib_compat: pass elif (set([orig_distribution, target_distribution]) == set(['ubuntu', 'debian'])): # Packages are more or less the same on Debian and Ubuntu logger.warning("Installing on %s but pack was generated on %s", target_distribution.capitalize(), orig_distribution.capitalize()) elif target_distribution is None: raise CantFindInstaller("Target distribution is unknown; try using " "--distribution") elif orig_distribution != target_distribution: raise CantFindInstaller( "Installing on %s but pack was generated on %s" % ( target_distribution.capitalize(), orig_distribution.capitalize())) # Selects installation method if target_distribution == 'ubuntu': installer = AptInstaller('apt-get') elif target_distribution == 'debian': # aptitude is not installed by default, so use apt-get here too installer = AptInstaller('apt-get') elif (target_distribution in ('centos', 'centos linux', 'fedora', 'scientific linux') or target_distribution.startswith('red hat')): installer = YumInstaller() else: raise CantFindInstaller("This distribution, \"%s\", is not supported" % target_distribution.capitalize()) return installer ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/unpackers/common/x11.py0000664000175000017500000003604714505567114021671 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Utility functions dealing with X servers. """ from __future__ import division, print_function, unicode_literals import contextlib import logging import os from rpaths import Path, PosixPath import select import socket import struct import threading from reprounzip.utils import irange, iteritems logger = logging.getLogger('reprounzip') # #include # # typedef struct xauth { # unsigned short family; # unsigned short address_length; # char *address; # unsigned short number_length; # char *number; # unsigned short name_length; # char *name; # unsigned short data_length; # char *data; # } Xauth; _read_short = lambda fp: struct.unpack('>H', fp.read(2))[0] _write_short = lambda i: struct.pack('>H', i) def ascii(s): if isinstance(s, bytes): return s else: return s.encode('ascii') class Xauth(object): """A record in an Xauthority file. """ FAMILY_LOCAL = 256 FAMILY_INTERNET = 0 FAMILY_DECNET = 1 FAMILY_CHAOS = 2 FAMILY_INTERNET6 = 6 FAMILY_SERVERINTERPRETED = 5 def __init__(self, family, address, number, name, data): self.family = family self.address = address self.number = number self.name = name self.data = data @classmethod def from_file(cls, fp): family = _read_short(fp) address_length = _read_short(fp) address = fp.read(address_length) number_length = _read_short(fp) number = int(fp.read(number_length)) if number_length else None name_length = _read_short(fp) name = fp.read(name_length) data_length = _read_short(fp) data = fp.read(data_length) return cls(family, address, number, name, data) def as_bytes(self): number = ('%d' % self.number).encode('ascii') return (_write_short(self.family) + _write_short(len(self.address)) + ascii(self.address) + _write_short(len(number)) + number + _write_short(len(self.name)) + ascii(self.name) + _write_short(len(self.data)) + ascii(self.data)) class BaseX11Handler(object): """X11 handler. This selects a way to connect to the local X server and an authentication mechanism. If provides `fix_env()` to set the X environment variable for the experiment, `init_cmds` to setup X before running the experiment's main commands, and `port_forward` which describes the reverse port tunnels from the experiment to the local X server. """ class X11Handler(BaseX11Handler): """X11 handler that will connect to a server outside on the host. This connects out of the created environment using the network. It is used by Vagrant (through SSH) and Docker (TCP connection), and may have significant latency. """ DISPLAY_NUMBER = 15 SOCK2X = {socket.AF_INET: Xauth.FAMILY_INTERNET, socket.AF_INET6: Xauth.FAMILY_INTERNET6} X2SOCK = dict((v, k) for k, v in iteritems(SOCK2X)) def __init__(self, enabled, target, display=None): self.enabled = enabled if not self.enabled: return self.target = target self.xauth = PosixPath('/.reprounzip_xauthority') self.display = (int(display) if display is not None else self.DISPLAY_NUMBER) logger.debug("X11 support enabled; will create Xauthority file %s " "for experiment. Display number is %d", self.xauth, self.display) # List of addresses that match the $DISPLAY variable possible, local_display = self._locate_display() tcp_portnum = ((6000 + local_display) if local_display is not None else None) if ('XAUTHORITY' in os.environ and Path(os.environ['XAUTHORITY']).is_file()): xauthority = Path(os.environ['XAUTHORITY']) # Note: I'm assuming here that Xauthority has no XDG support else: xauthority = Path('~').expand_user() / '.Xauthority' # Read Xauthority file xauth_entries = {} if xauthority.is_file(): with xauthority.open('rb') as fp: fp.seek(0, os.SEEK_END) size = fp.tell() fp.seek(0, os.SEEK_SET) while fp.tell() < size: entry = Xauth.from_file(fp) if (entry.name == 'MIT-MAGIC-COOKIE-1' and entry.number == local_display): if entry.family == Xauth.FAMILY_LOCAL: logger.debug("Found cookie for local connection") xauth_entries[(entry.family, None)] = entry elif (entry.family == Xauth.FAMILY_INTERNET or entry.family == Xauth.FAMILY_INTERNET6): logger.debug("Found cookie for %r", (entry.family, entry.address)) xauth_entries[(entry.family, entry.address)] = entry else: logger.debug("No Xauthority file") logger.debug("Possible X endpoints: %s", possible) # Select socket and authentication cookie self.xauth_record = None self.connection_info = None for family, address in possible: # Checks that we have a cookie entry = family, (None if family is Xauth.FAMILY_LOCAL else address) if entry not in xauth_entries: continue if family == Xauth.FAMILY_LOCAL and hasattr(socket, 'AF_UNIX'): # Checks that the socket exists if not Path(address).exists(): continue self.connection_info = (socket.AF_UNIX, socket.SOCK_STREAM, address) self.xauth_record = xauth_entries[(family, None)] logger.debug("Will connect to local X display via UNIX " "socket %s", address) break else: # Checks that we have a cookie family = self.X2SOCK[family] self.connection_info = (family, socket.SOCK_STREAM, (address, tcp_portnum)) self.xauth_record = xauth_entries[(family, address)] logger.debug("Will connect to X display %s:%d via %s/TCP", address, tcp_portnum, "IPv6" if family == socket.AF_INET6 else "IPv4") break # Didn't find an Xauthority record -- assume no authentication is # needed, but still set self.connection_info if self.connection_info is None: logger.debug("Didn't find any matching Xauthority entry") for family, address in possible: # Only try UNIX sockets, we'll use 127.0.0.1 otherwise if family == Xauth.FAMILY_LOCAL: if not hasattr(socket, 'AF_UNIX'): continue self.connection_info = (socket.AF_UNIX, socket.SOCK_STREAM, address) logger.debug("Will connect to X display via UNIX socket " "%s, no authentication", address) break else: self.connection_info = (socket.AF_INET, socket.SOCK_STREAM, ('127.0.0.1', tcp_portnum)) logger.debug("Will connect to X display 127.0.0.1:%d via " "IPv4/TCP, no authentication", tcp_portnum) if self.connection_info is None: raise RuntimeError("Couldn't determine how to connect to local X " "server, DISPLAY is %s" % ( repr(os.environ['DISPLAY']) if 'DISPLAY' in os.environ else 'not set')) @classmethod def _locate_display(cls): """Reads $DISPLAY and figures out possible sockets. """ # We default to ":0", Xming for instance doesn't set $DISPLAY display = os.environ.get('DISPLAY', ':0') # It might be the full path to a UNIX socket if display.startswith('/'): return [(Xauth.FAMILY_LOCAL, display)], None local_addr, local_display = display.rsplit(':', 1) local_display = int(local_display.split('.', 1)[0]) # Let's order the socket families: IPv4 first, then v6, then others def sort_families(gai, order={socket.AF_INET: 0, socket.AF_INET6: 1}): return sorted(gai, key=lambda x: order.get(x[0], 999999)) # Network addresses of the local machine local_addresses = [] for family, socktype, proto, canonname, sockaddr in \ sort_families(socket.getaddrinfo(socket.gethostname(), 6000)): try: family = cls.SOCK2X[family] except KeyError: continue local_addresses.append((family, sockaddr[0])) logger.debug("Local addresses: %s", (local_addresses,)) # Determine possible addresses for $DISPLAY if not local_addr: possible = [(Xauth.FAMILY_LOCAL, '/tmp/.X11-unix/X%d' % local_display)] possible += local_addresses else: local_possible = False possible = [] for family, socktype, proto, canonname, sockaddr in \ sort_families(socket.getaddrinfo(local_addr, 6000)): try: family = cls.SOCK2X[family] except KeyError: continue if (family, sockaddr[0]) in local_addresses: local_possible = True possible.append((family, sockaddr[0])) if local_possible: possible = [(Xauth.FAMILY_LOCAL, '/tmp/.X11-unix/X%d' % local_display)] + possible return possible, local_display @property def port_forward(self): """Builds the port forwarding info, for `run_interactive()`. Just requests port 6015 on the remote host to be forwarded to the X socket identified by `self.connection_info`. """ if not self.enabled: return [] @contextlib.contextmanager def connect(src_addr): logger.info("Got remote X connection from %s", (src_addr,)) logger.debug("Connecting to X server: %s", (self.connection_info,)) sock = socket.socket(*self.connection_info[:2]) sock.connect(self.connection_info[2]) yield sock sock.close() logger.info("X connection from %s closed", (src_addr,)) return [(6000 + self.display, connect)] def fix_env(self, env): """Sets ``$XAUTHORITY`` and ``$DISPLAY`` in the environment. """ if not self.enabled: return env new_env = dict(env) new_env['XAUTHORITY'] = str(self.xauth) if self.target[0] == 'local': new_env['DISPLAY'] = '127.0.0.1:%d' % self.display elif self.target[0] == 'internet': new_env['DISPLAY'] = '%s:%d' % (self.target[1], self.display) return new_env @property def init_cmds(self): """Gets the commands to setup X on the server before the experiment. """ if not self.enabled or self.xauth_record is None: return [] if self.target[0] == 'local': xauth_record = Xauth(Xauth.FAMILY_LOCAL, self.target[1], self.display, self.xauth_record.name, self.xauth_record.data) elif self.target[0] == 'internet': xauth_record = Xauth(Xauth.FAMILY_INTERNET, socket.inet_aton(self.target[1]), self.display, self.xauth_record.name, self.xauth_record.data) else: raise RuntimeError("Invalid target display type") buf = xauth_record.as_bytes() xauth = ''.join(('\\x%02x' % ord(buf[i:i + 1])) for i in irange(len(buf))) return ['echo -ne "%s" > %s' % (xauth, self.xauth)] class BaseForwarder(object): """Accepts connections and forwards to the given connector object. The `connector` is a function which takes the address of remote process connecting on this ends, and gives out a socket object that is the second endpoint of the tunnel. The socket object must provide ``recv()``, ``sendall()`` and ``close()``. Abstract class, implementations will provide actual ways to accept connections. """ def __init__(self, connector): self.connector = connector def _forward(self, client, src_addr): try: with self.connector(src_addr) as local_connection: local_fd = local_connection.fileno() client_fd = client.fileno() while True: r, w, x = select.select([local_fd, client_fd], [], []) if local_fd in r: data = local_connection.recv(4096) if not data: break client.sendall(data) elif client_fd in r: data = client.recv(4096) if not data: break local_connection.sendall(data) finally: client.close() class LocalForwarder(BaseForwarder): """Listens on a random port and forwards to the given connector object. The `connector` is a function which takes the address of remote process connecting on this ends, and gives out a socket object that is the second endpoint of the tunnel. The socket object must provide ``recv()``, ``sendall()`` and ``close()``. """ def __init__(self, connector, local_port=None): BaseForwarder.__init__(self, connector) server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.bind(('', local_port or 0)) self.local_port = server.getsockname()[1] server.listen(5) t = threading.Thread(target=self._accept, args=(server,)) t.setDaemon(True) t.start() def _accept(self, server): while True: client, src_addr = server.accept() t = threading.Thread(target=self._forward, args=(client, src_addr)) t.setDaemon(True) t.start() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/unpackers/default.py0000664000175000017500000011115414505567114021405 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Default unpackers for reprounzip. This file contains the default plugins that come with reprounzip: - ``directory`` puts all the files in a simple directory. This is simple but can be unreliable. - ``chroot`` creates a chroot environment. This is more reliable as you get a harder isolation from the host system. - ``installpkgs`` installs on your distribution the packages that were used by the experiment on the original machine. This is useful if some of them were not packed and you do not have them installed. """ from __future__ import division, print_function, unicode_literals import argparse import copy from elftools.common.exceptions import ELFError from elftools.elf.elffile import ELFFile from elftools.elf.segments import InterpSegment import logging import os import platform from rpaths import PosixPath, DefaultAbstractPath, Path import socket import subprocess import sys import tarfile from reprounzip.common import RPZPack, load_config as load_config_file, \ record_usage from reprounzip import signals from reprounzip.unpackers.common import THIS_DISTRIBUTION, PKG_NOT_INSTALLED, \ COMPAT_OK, COMPAT_NO, CantFindInstaller, target_must_exist, shell_escape, \ load_config, select_installer, busybox_url, join_root, FileUploader, \ FileDownloader, get_runs, add_environment_options, fixup_environment, \ interruptible_call, metadata_read, metadata_write, \ metadata_initial_iofiles, metadata_update_run from reprounzip.unpackers.common.x11 import X11Handler, LocalForwarder from reprounzip.utils import unicode_, irange, iteritems, itervalues, \ stdout_bytes, stderr, make_dir_writable, rmtree_fixed, copyfile, \ download_file logger = logging.getLogger('reprounzip') def get_elf_interpreter(file): try: elf = ELFFile(file) for segment in elf.iter_segments(): if isinstance(segment, InterpSegment): return segment.get_interp_name() return None except ELFError: return None def installpkgs(args): """Installs the necessary packages on the current machine. """ if not THIS_DISTRIBUTION: logger.critical("Not running on Linux") sys.exit(1) pack = args.pack[0] missing = args.missing # Loads config runs, packages, other_files = load_config(pack) try: installer = select_installer(pack, runs) except CantFindInstaller as e: logger.error("Couldn't select a package installer: %s", e) sys.exit(1) if args.summary: # Print out a list of packages with their status if missing: print("Packages not present in pack:") packages = [pkg for pkg in packages if not pkg.packfiles] else: print("All packages:") pkgs = installer.get_packages_info(packages) for pkg in packages: print(" %s (required version: %s, status: %s)" % ( pkg.name, pkg.version, pkgs[pkg.name][1])) else: if missing: # With --missing, ignore packages whose files were packed packages = [pkg for pkg in packages if not pkg.packfiles] # Installs packages record_usage(installpkgs_installing=len(packages)) r, pkgs = installer.install(packages, assume_yes=args.assume_yes) for pkg in packages: req = pkg.version real = pkgs[pkg.name][1] if real == PKG_NOT_INSTALLED: logger.warning("package %s was not installed", pkg.name) else: logger.warning("version %s of %s was installed, instead of " "%s", real, pkg.name, req) if r != 0: logger.critical("Installer exited with %d", r) sys.exit(r) def directory_create(args): """Unpacks the experiment in a folder. Only the files that are not part of a package are copied (unless they are missing from the system and were packed). In addition, input files are put in a tar.gz (so they can be put back after an upload) and the configuration file is extracted. """ if not args.pack: logger.critical("setup needs the pack filename") sys.exit(1) pack = Path(args.pack[0]) target = Path(args.target[0]) if target.exists(): logger.critical("Target directory exists") sys.exit(1) if not issubclass(DefaultAbstractPath, PosixPath): logger.critical("Not unpacking on POSIX system") sys.exit(1) signals.pre_setup(target=target, pack=pack) # Unpacks configuration file rpz_pack = RPZPack(pack) rpz_pack.extract_config(target / 'config.yml') # Loads config config = load_config_file(target / 'config.yml', True) packages = config.packages target.mkdir() root = (target / 'root').absolute() # Checks packages missing_files = False for pkg in packages: if pkg.packfiles: continue for f in pkg.files: if not Path(f.path).exists(): logger.error( "Missing file %s (from package %s that wasn't packed) " "on host, experiment will probably miss it.", f, pkg.name) missing_files = True if missing_files: record_usage(directory_missing_pkgs=True) logger.error("Some packages are missing, you should probably install " "them.\nUse 'reprounzip installpkgs -h' for help") root.mkdir() try: # Unpacks files members = rpz_pack.list_data() for m in members: # Remove 'DATA/' prefix m.name = str(rpz_pack.remove_data_prefix(m.name)) # Makes symlink targets relative if m.issym(): linkname = PosixPath(m.linkname) if linkname.is_absolute: m.linkname = join_root(root, PosixPath(m.linkname)).path logger.info("Extracting files...") rpz_pack.extract_data(root, members) rpz_pack.close() # Original input files, so upload can restore them input_files = [f.path for f in itervalues(config.inputs_outputs) if f.read_runs] if input_files: logger.info("Packing up original input files...") inputtar = tarfile.open(str(target / 'inputs.tar.gz'), 'w:gz') for ifile in input_files: filename = join_root(root, ifile) if filename.exists(): inputtar.add(str(filename), str(ifile)) inputtar.close() # Meta-data for reprounzip metadata_write(target, metadata_initial_iofiles(config), 'directory') signals.post_setup(target=target, pack=pack) except Exception: rmtree_fixed(root) raise @target_must_exist def directory_run(args): """Runs the command in the directory. """ target = Path(args.target[0]) unpacked_info = metadata_read(target, 'directory') cmdline = args.cmdline # Loads config config = load_config_file(target / 'config.yml', True) runs = config.runs selected_runs = get_runs(runs, args.run, cmdline) root = (target / 'root').absolute() # Gets library paths lib_dirs = [] logger.debug("Running: %s", "/sbin/ldconfig -v -N") p = subprocess.Popen(['/sbin/ldconfig', '-v', '-N'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) stdout, _ = p.communicate() try: for line in stdout.splitlines(): if len(line) < 2 or line[0] in (b' ', b'\t'): continue if line.endswith(b':'): lib_dirs.append(Path(line[:-1])) finally: if p.returncode != 0: raise subprocess.CalledProcessError(p.returncode, ['/sbin/ldconfig', '-v', '-N']) cmds = [] for run_number in selected_runs: run = runs[run_number] cmd = 'cd %s && ' % shell_escape( unicode_(join_root(root, Path(run['workingdir'])))) cmd += '/usr/bin/env -i ' cmd += 'LD_LIBRARY_PATH=%s ' % ':'.join( shell_escape(unicode_(join_root(root, d))) for d in lib_dirs ) environ = run['environ'] environ = fixup_environment(environ, args) if args.x11: environ = dict(environ) if 'DISPLAY' in os.environ: environ['DISPLAY'] = os.environ['DISPLAY'] if 'XAUTHORITY' in os.environ: environ['XAUTHORITY'] = os.environ['XAUTHORITY'] cmd += ' '.join('%s=%s' % (shell_escape(k), shell_escape(v)) for k, v in iteritems(environ) if k != 'PATH') cmd += ' ' # PATH # Get the original PATH components path = [PosixPath(d) for d in run['environ'].get('PATH', '').split(':')] # The same paths but in the directory dir_path = [join_root(root, d) for d in path if d.root == '/'] # Rebuild string path = ':'.join(unicode_(d) for d in dir_path + path) cmd += 'PATH=%s ' % shell_escape(path) interpreter = get_elf_interpreter( join_root(root, PosixPath(run['binary'])).open('rb'), ) if interpreter is not None: interpreter = Path(interpreter) if interpreter.exists(): cmd += '%s ' % shell_escape(str(join_root(root, interpreter))) # FIXME : Use exec -a or something if binary != argv[0] if cmdline is None: argv = run['argv'] # If the command is not a path, use the path instead if '/' not in argv[0]: argv = [run['binary']] + argv[1:] # Rewrites command-line arguments that are absolute filenames rewritten = False for i in irange(len(argv)): try: p = Path(argv[i]) except UnicodeEncodeError: continue if p.is_absolute: rp = join_root(root, p) if (rp.exists() or (len(rp.components) > 3 and rp.parent.exists())): argv[i] = str(rp) rewritten = True if rewritten: logger.warning("Rewrote command-line as: %s", ' '.join(shell_escape(a) for a in argv)) else: argv = cmdline cmd += ' '.join(shell_escape(a) for a in argv) cmds.append(cmd) cmds = ' && '.join(cmds) signals.pre_run(target=target) logger.debug("Running: %s", cmds) retcode = interruptible_call(cmds, shell=True) stderr.write("\n*** Command finished, status: %d\n" % retcode) signals.post_run(target=target, retcode=retcode) # Update input file status metadata_update_run(config, unpacked_info, selected_runs) metadata_write(target, unpacked_info, 'directory') @target_must_exist def directory_destroy(args): """Destroys the directory. """ target = Path(args.target[0]) metadata_read(target, 'directory') logger.info("Removing directory %s...", target) signals.pre_destroy(target=target) rmtree_fixed(target) signals.post_destroy(target=target) def should_restore_owner(param): """Computes whether to restore original files' owners. """ if os.getuid() != 0: if param is True: # Restoring the owner was explicitely requested logger.critical("Not running as root, cannot restore files' " "owner/group as requested") sys.exit(1) elif param is None: # Nothing was requested logger.warning("Not running as root, won't restore files' " "owner/group") ret = False else: # If False: skip warning ret = False else: if param is None: # Nothing was requested logger.info("Running as root, we will restore files' " "owner/group") ret = True elif param is True: ret = True else: # If False: skip warning ret = False record_usage(restore_owner=ret) return ret def should_mount_magic_dirs(param): """Computes whether to mount directories inside the chroot. """ if os.getuid() != 0: if param is True: # Restoring the owner was explicitely requested logger.critical("Not running as root, cannot mount /dev and " "/proc") sys.exit(1) elif param is None: # Nothing was requested logger.warning("Not running as root, won't mount /dev and /proc") ret = False else: # If False: skip warning ret = False else: if param is None: # Nothing was requested logger.info("Running as root, will mount /dev and /proc") ret = True elif param is True: ret = True else: # If False: skip warning ret = False record_usage(mount_magic_dirs=ret) return ret def chroot_create(args): """Unpacks the experiment in a folder so it can be run with chroot. All the files in the pack are unpacked; system files are copied only if they were not packed, and busybox is installed if /bin/sh wasn't packed. In addition, input files are put in a tar.gz (so they can be put back after an upload) and the configuration file is extracted. """ if not args.pack: logger.critical("setup/create needs the pack filename") sys.exit(1) pack = Path(args.pack[0]) target = Path(args.target[0]) if target.exists(): logger.critical("Target directory exists") sys.exit(1) if not issubclass(DefaultAbstractPath, PosixPath): logger.critical("Not unpacking on POSIX system") sys.exit(1) signals.pre_setup(target=target, pack=pack) # We can only restore owner/group of files if running as root restore_owner = should_restore_owner(args.restore_owner) # Unpacks configuration file rpz_pack = RPZPack(pack) rpz_pack.extract_config(target / 'config.yml') # Loads config config = load_config_file(target / 'config.yml', True) packages = config.packages target.mkdir() root = (target / 'root').absolute() root.mkdir() try: # Checks that everything was packed packages_not_packed = [pkg for pkg in packages if not pkg.packfiles] if packages_not_packed: record_usage(chroot_missing_pkgs=True) logger.warning("According to configuration, some files were left " "out because they belong to the following " "packages:%s\nWill copy files from HOST SYSTEM", ''.join('\n %s' % pkg for pkg in packages_not_packed)) missing_files = False for pkg in packages_not_packed: for f in pkg.files: path = Path(f.path) if not path.exists(): logger.error( "Missing file %s (from package %s) on host, " "experiment will probably miss it", path, pkg.name) missing_files = True continue dest = join_root(root, path) dest.parent.mkdir(parents=True) if path.is_link(): dest.symlink(path.read_link()) else: path.copy(dest) if restore_owner: stat = path.stat() dest.chown(stat.st_uid, stat.st_gid) if missing_files: record_usage(chroot_mising_files=True) # Unpacks files members = rpz_pack.list_data() for m in members: # Remove 'DATA/' prefix m.name = str(rpz_pack.remove_data_prefix(m.name)) if not restore_owner: uid = os.getuid() gid = os.getgid() for m in members: m.uid = uid m.gid = gid logger.info("Extracting files...") rpz_pack.extract_data(root, members) rpz_pack.close() resolvconf_src = Path('/etc/resolv.conf') if resolvconf_src.exists(): try: resolvconf_src.copy(root / 'etc/resolv.conf') except IOError: pass # Sets up /bin/sh and /usr/bin/env, downloading busybox if necessary sh_path = join_root(root, Path('/bin/sh')) env_path = join_root(root, Path('/usr/bin/env')) if not sh_path.lexists() or not env_path.lexists(): logger.info("Setting up busybox...") busybox_path = join_root(root, Path('/bin/busybox')) busybox_path.parent.mkdir(parents=True) with make_dir_writable(join_root(root, Path('/bin'))): download_file(busybox_url(config.runs[0]['architecture']), busybox_path, 'busybox-%s' % config.runs[0]['architecture']) busybox_path.chmod(0o755) if not sh_path.lexists(): sh_path.parent.mkdir(parents=True) sh_path.symlink('/bin/busybox') if not env_path.lexists(): env_path.parent.mkdir(parents=True) env_path.symlink('/bin/busybox') # Original input files, so upload can restore them input_files = [f.path for f in itervalues(config.inputs_outputs) if f.read_runs] if input_files: logger.info("Packing up original input files...") inputtar = tarfile.open(str(target / 'inputs.tar.gz'), 'w:gz') for ifile in input_files: filename = join_root(root, ifile) if filename.exists(): inputtar.add(str(filename), str(ifile)) inputtar.close() # Meta-data for reprounzip metadata_write(target, metadata_initial_iofiles(config), 'chroot') signals.post_setup(target=target, pack=pack) except Exception: rmtree_fixed(root) raise @target_must_exist def chroot_mount(args): """Mounts /dev and /proc inside the chroot directory. """ target = Path(args.target[0]) unpacked_info = metadata_read(target, 'chroot') # Create proc mount d = target / 'root/proc' d.mkdir(parents=True) subprocess.check_call(['mount', '-t', 'proc', 'none', str(d)]) # Bind /dev from host for m in ('/dev', '/dev/pts'): d = join_root(target / 'root', Path(m)) d.mkdir(parents=True) logger.info("Mounting %s on %s...", m, d) subprocess.check_call(['mount', '-o', 'bind', m, str(d)]) unpacked_info['mounted'] = True metadata_write(target, unpacked_info, 'chroot') logger.warning("The host's /dev and /proc have been mounted into the " "chroot. Do NOT remove the unpacked directory with " "rm -rf, it WILL WIPE the host's /dev directory.") @target_must_exist def chroot_run(args): """Runs the command in the chroot. """ target = Path(args.target[0]) unpacked_info = metadata_read(target, 'chroot') cmdline = args.cmdline # Loads config config = load_config_file(target / 'config.yml', True) runs = config.runs selected_runs = get_runs(runs, args.run, cmdline) root = target / 'root' # X11 handler x11 = X11Handler(args.x11, ('local', socket.gethostname()), args.x11_display) cmds = [] for run_number in selected_runs: run = runs[run_number] cmd = 'cd %s && ' % shell_escape(run['workingdir']) cmd += '/usr/bin/env -i ' environ = x11.fix_env(run['environ']) environ = fixup_environment(environ, args) cmd += ' '.join('%s=%s' % (shell_escape(k), shell_escape(v)) for k, v in iteritems(environ)) cmd += ' ' # FIXME : Use exec -a or something if binary != argv[0] if cmdline is None: argv = [run['binary']] + run['argv'][1:] else: argv = cmdline cmd += ' '.join(shell_escape(a) for a in argv) userspec = '%s:%s' % (run.get('uid', 1000), run.get('gid', 1000)) cmd = 'chroot --userspec=%s %s /bin/sh -c %s' % ( userspec, shell_escape(unicode_(root)), shell_escape(cmd)) cmds.append(cmd) cmds = ['chroot %s /bin/sh -c %s' % (shell_escape(unicode_(root)), shell_escape(c)) for c in x11.init_cmds] + cmds cmds = ' && '.join(cmds) # Starts forwarding forwarders = [] for portnum, connector in x11.port_forward: fwd = LocalForwarder(connector, portnum) forwarders.append(fwd) signals.pre_run(target=target) logger.debug("Running: %s", cmds) retcode = interruptible_call(cmds, shell=True) stderr.write("\n*** Command finished, status: %d\n" % retcode) signals.post_run(target=target, retcode=retcode) # Update input file status metadata_update_run(config, unpacked_info, selected_runs) metadata_write(target, unpacked_info, 'chroot') def chroot_unmount(target): """Unmount magic directories, if they are mounted. """ unpacked_info = metadata_read(target, 'chroot') mounted = unpacked_info.get('mounted', False) if not mounted: return False target = target.resolve() for m in ('/dev', '/proc'): d = join_root(target / 'root', Path(m)) if d.exists(): logger.info("Unmounting %s...", d) # Unmounts recursively subprocess.check_call( 'grep %s /proc/mounts | ' 'cut -f2 -d" " | ' 'sort -r | ' 'xargs umount' % d, shell=True) unpacked_info['mounted'] = False metadata_write(target, unpacked_info, 'chroot') return True @target_must_exist def chroot_destroy_unmount(args): """Unmounts the bound magic dirs. """ target = Path(args.target[0]) if not chroot_unmount(target): logger.critical("Magic directories were not mounted") sys.exit(1) @target_must_exist def chroot_destroy_dir(args): """Destroys the directory. """ target = Path(args.target[0]) mounted = metadata_read(target, 'chroot').get('mounted', False) if mounted: logger.critical("Magic directories might still be mounted") sys.exit(1) logger.info("Removing directory %s...", target) signals.pre_destroy(target=target) rmtree_fixed(target) signals.post_destroy(target=target) @target_must_exist def chroot_destroy(args): """Destroys the directory, unmounting first if necessary. """ target = Path(args.target[0]) chroot_unmount(target) logger.info("Removing directory %s...", target) signals.pre_destroy(target=target) rmtree_fixed(target) signals.post_destroy(target=target) class LocalUploader(FileUploader): def __init__(self, target, input_files, files, type_, param_restore_owner): self.type = type_ self.param_restore_owner = param_restore_owner FileUploader.__init__(self, target, input_files, files) def prepare_upload(self, files): self.restore_owner = (self.type == 'chroot' and should_restore_owner(self.param_restore_owner)) self.root = (self.target / 'root').absolute() def extract_original_input(self, input_name, input_path, temp): tar = tarfile.open(str(self.target / 'inputs.tar.gz'), 'r:*') member = tar.getmember(str(join_root(PosixPath(''), input_path))) member = copy.copy(member) member.name = str(temp.components[-1]) tar.extract(member, str(temp.parent)) tar.close() return temp def upload_file(self, local_path, input_path): remote_path = join_root(self.root, input_path) # Copy orig_stat = remote_path.stat() with make_dir_writable(remote_path.parent): local_path.copyfile(remote_path) remote_path.chmod(orig_stat.st_mode & 0o7777) if self.restore_owner: remote_path.chown(orig_stat.st_uid, orig_stat.st_gid) @target_must_exist def upload(args): """Replaces an input file in the directory. """ target = Path(args.target[0]) files = args.file unpacked_info = metadata_read(target, args.type) input_files = unpacked_info.setdefault('input_files', {}) try: LocalUploader(target, input_files, files, args.type, args.type == 'chroot' and args.restore_owner) finally: metadata_write(target, unpacked_info, args.type) class LocalDownloader(FileDownloader): def __init__(self, target, files, type_, all_=False): self.type = type_ FileDownloader.__init__(self, target, files, all_=all_) def prepare_download(self, files): self.root = (self.target / 'root').absolute() def download_and_print(self, remote_path): remote_path = join_root(self.root, remote_path) # Output to stdout if not remote_path.exists(): logger.critical("Can't get output file (doesn't exist): %s", remote_path) return False with remote_path.open('rb') as fp: copyfile(fp, stdout_bytes) return True def download(self, remote_path, local_path): remote_path = join_root(self.root, remote_path) # Copy if not remote_path.exists(): logger.critical("Can't get output file (doesn't exist): %s", remote_path) return False remote_path.copyfile(local_path) remote_path.copymode(local_path) return True @target_must_exist def download(args): """Gets an output file from the directory. """ target = Path(args.target[0]) files = args.file metadata_read(target, args.type) LocalDownloader(target, files, args.type, all_=args.all) def test_same_pkgmngr(pack, config, **kwargs): """Compatibility test: platform is Linux and uses same package manager. """ runs, packages, other_files = config orig_distribution = runs[0]['distribution'][0].lower() if not THIS_DISTRIBUTION: return COMPAT_NO, "This machine is not running Linux" elif THIS_DISTRIBUTION == orig_distribution: return COMPAT_OK else: return COMPAT_NO, "Different distributions. Then: %s, now: %s" % ( orig_distribution, THIS_DISTRIBUTION) def test_linux_same_arch(pack, config, **kwargs): """Compatibility test: this platform is Linux and arch is compatible. """ runs, packages, other_files = config orig_architecture = runs[0]['architecture'] current_architecture = platform.machine().lower() if not sys.platform.startswith('linux'): return COMPAT_NO, "This machine is not running Linux" elif (orig_architecture == current_architecture or (orig_architecture == 'i386' and current_architecture == 'amd64')): return COMPAT_OK else: return COMPAT_NO, "Different architectures. Then: %s, now: %s" % ( orig_architecture, current_architecture) def setup_installpkgs(parser): """Installs the required packages on this system """ parser.add_argument('pack', nargs=1, help="Pack to process") parser.add_argument( '-y', '--assume-yes', action='store_true', default=False, help="Assumes yes for package manager's questions (if supported)") parser.add_argument( '--missing', action='store_true', help="Only install packages that weren't packed") parser.add_argument( '--summary', action='store_true', help="Don't install, print which packages are installed or not") parser.set_defaults(func=installpkgs) return {'test_compatibility': test_same_pkgmngr} def setup_directory(parser, **kwargs): """Unpacks the files in a directory and runs with PATH and LD_LIBRARY_PATH setup creates the directory (needs the pack filename) upload replaces input files in the directory (without arguments, lists input files) run runs the experiment download gets output files (without arguments, lists output files) destroy removes the unpacked directory Upload specifications are either: :input_id restores the original input file from the pack filename:input_id replaces the input file with the specified local file Download specifications are either: output_id: print the output file to stdout output_id:filename extracts the output file to the corresponding local path """ subparsers = parser.add_subparsers(title="actions", metavar='', help=argparse.SUPPRESS) def add_opt_general(opts): opts.add_argument('target', nargs=1, help="Experiment directory") # setup parser_setup = subparsers.add_parser('setup') parser_setup.add_argument('pack', nargs=1, help="Pack to extract") # Note: add_opt_general is called later so that 'pack' is before 'target' add_opt_general(parser_setup) parser_setup.set_defaults(func=directory_create) # upload parser_upload = subparsers.add_parser('upload') add_opt_general(parser_upload) parser_upload.add_argument('file', nargs=argparse.ZERO_OR_MORE, help=":") parser_upload.set_defaults(func=upload, type='directory') # run parser_run = subparsers.add_parser('run') add_opt_general(parser_run) parser_run.add_argument('run', default=None, nargs=argparse.OPTIONAL) parser_run.add_argument('--cmdline', nargs=argparse.REMAINDER, help="Command line to run") parser_run.add_argument('--enable-x11', action='store_true', default=False, dest='x11', help="Enable X11 support (needs an X server)") add_environment_options(parser_run) parser_run.set_defaults(func=directory_run) # download parser_download = subparsers.add_parser('download') add_opt_general(parser_download) parser_download.add_argument('file', nargs=argparse.ZERO_OR_MORE, help="[:]") parser_download.add_argument('--all', action='store_true', help="Download all output files to the " "current directory") parser_download.set_defaults(func=download, type='directory') # destroy parser_destroy = subparsers.add_parser('destroy') add_opt_general(parser_destroy) parser_destroy.set_defaults(func=directory_destroy) return {'test_compatibility': test_linux_same_arch} def chroot_setup(args): """Does both create and mount depending on --bind-magic-dirs. """ do_mount = should_mount_magic_dirs(args.bind_magic_dirs) chroot_create(args) if do_mount: chroot_mount(args) def setup_chroot(parser, **kwargs): """Unpacks the files and run with chroot setup/create creates the directory (needs the pack filename) setup/mount mounts --bind /dev and /proc inside the chroot (do NOT rm -Rf the directory after that!) upload replaces input files in the directory (without arguments, lists input files) run runs the experiment download gets output files (without arguments, lists output files) destroy/unmount unmounts /dev and /proc from the directory destroy/dir removes the unpacked directory Upload specifications are either: :input_id restores the original input file from the pack filename:input_id replaces the input file with the specified local file Download specifications are either: output_id: print the output file to stdout output_id:filename extracts the output file to the corresponding local path """ subparsers = parser.add_subparsers(title="actions", metavar='', help=argparse.SUPPRESS) def add_opt_general(opts): opts.add_argument('target', nargs=1, help="Experiment directory") # setup/create def add_opt_setup(opts): opts.add_argument('pack', nargs=1, help="Pack to extract") def add_opt_owner(opts): opts.add_argument('--preserve-owner', action='store_true', dest='restore_owner', default=None, help="Restore files' owner/group when extracting") opts.add_argument('--dont-preserve-owner', action='store_false', dest='restore_owner', default=None, help="Don't restore files' owner/group when " "extracting, use current users") parser_setup_create = subparsers.add_parser('setup/create') add_opt_setup(parser_setup_create) add_opt_general(parser_setup_create) add_opt_owner(parser_setup_create) parser_setup_create.set_defaults(func=chroot_create) # setup/mount parser_setup_mount = subparsers.add_parser('setup/mount') add_opt_general(parser_setup_mount) parser_setup_mount.set_defaults(func=chroot_mount) # setup parser_setup = subparsers.add_parser('setup') add_opt_setup(parser_setup) add_opt_general(parser_setup) add_opt_owner(parser_setup) parser_setup.add_argument( '--bind-magic-dirs', action='store_true', dest='bind_magic_dirs', default=None, help="Mount /dev and /proc inside the chroot") parser_setup.add_argument( '--dont-bind-magic-dirs', action='store_false', dest='bind_magic_dirs', default=None, help="Don't mount /dev and /proc inside the chroot") parser_setup.set_defaults(func=chroot_setup) # upload parser_upload = subparsers.add_parser('upload') add_opt_general(parser_upload) add_opt_owner(parser_upload) parser_upload.add_argument('file', nargs=argparse.ZERO_OR_MORE, help=":") parser_upload.set_defaults(func=upload, type='chroot') # run parser_run = subparsers.add_parser('run') add_opt_general(parser_run) parser_run.add_argument('run', default=None, nargs=argparse.OPTIONAL) parser_run.add_argument('--cmdline', nargs=argparse.REMAINDER, help="Command line to run") parser_run.add_argument('--enable-x11', action='store_true', default=False, dest='x11', help="Enable X11 support (needs an X server on " "the host)") parser_run.add_argument('--x11-display', dest='x11_display', help="Display number to use on the experiment " "side (change the host display with the " "DISPLAY environment variable)") add_environment_options(parser_run) parser_run.set_defaults(func=chroot_run) # download parser_download = subparsers.add_parser('download') add_opt_general(parser_download) parser_download.add_argument('file', nargs=argparse.ZERO_OR_MORE, help="[:]") parser_download.add_argument('--all', action='store_true', help="Download all output files to the " "current directory") parser_download.set_defaults(func=download, type='chroot') # destroy/unmount parser_destroy_unmount = subparsers.add_parser('destroy/unmount') add_opt_general(parser_destroy_unmount) parser_destroy_unmount.set_defaults(func=chroot_destroy_unmount) # destroy/dir parser_destroy_dir = subparsers.add_parser('destroy/dir') add_opt_general(parser_destroy_dir) parser_destroy_dir.set_defaults(func=chroot_destroy_dir) # destroy parser_destroy = subparsers.add_parser('destroy') add_opt_general(parser_destroy) parser_destroy.set_defaults(func=chroot_destroy) return {'test_compatibility': test_linux_same_arch} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/unpackers/graph.py0000664000175000017500000007375214505567114021075 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Graph plugin for reprounzip. This is not actually an unpacker, it just creates a graph from the metadata collected by the reprozip tracer (either from a pack file or the initial .rpz directory). It creates a file in GraphViz DOT format, which can be turned into an image by using the dot utility. See http://www.graphviz.org/ """ from __future__ import division, print_function, unicode_literals import argparse from distutils.version import LooseVersion import heapq import json import logging import re from rpaths import PosixPath, Path import sqlite3 import sys from reprounzip.common import FILE_READ, FILE_WRITE, FILE_WDIR, RPZPack, \ load_config from reprounzip.orderedset import OrderedSet from reprounzip.unpackers.common import COMPAT_OK, COMPAT_NO from reprounzip.utils import PY3, izip, iteritems, itervalues, stderr, \ unicode_, escape, normalize_path logger = logging.getLogger('reprounzip.graph') C_INITIAL = 0 # First process or don't know C_FORK = 1 # Might actually be any one of fork, vfork or clone C_EXEC = 2 # Replaced image with execve C_FORKEXEC = 3 # A fork then an exec, folded as one because all_forks==False FORMAT_DOT = 0 FORMAT_JSON = 1 LVL_PKG_FILE = 0 # Show individual files in packages LVL_PKG_PACKAGE = 1 # Aggregate by package LVL_PKG_IGNORE = 2 # Ignore packages, treat them like any file LVL_PKG_DROP = 3 # Drop every file that comes from a package LVL_PROC_THREAD = 0 # Show every process and thread LVL_PROC_PROCESS = 1 # Only show processes, not threads LVL_PROC_RUN = 2 # Don't show individual processes, aggregate by run LVL_OTHER_ALL = 0 # Show every file, aggregate through directory list LVL_OTHER_IO = 1 # Only show input & output files LVL_OTHER_NO = 3 # Don't show other files class Run(object): """Structure representing a whole run. """ def __init__(self, nb): self.nb = nb self.name = "run %d" % nb self.processes = [] def dot(self, fp, level_processes): assert self.processes if level_processes == LVL_PROC_RUN: fp.write(' run%d [label="%d: %s"];\n' % ( self.nb, self.nb, self.processes[0].binary or "-")) else: fp.write(' subgraph cluster_run%d {\n label="%s";\n' % ( self.nb, escape(self.name))) for process in self.processes: if level_processes == LVL_PROC_THREAD or not process.thread: process.dot(fp, level_processes, indent=2) fp.write(' }\n') def dot_endpoint(self, level_processes): return 'run%d' % self.nb def json(self, prog_map, level_processes): assert self.processes if level_processes == LVL_PROC_RUN: json_process = self.processes[0].json() for process in self.processes: prog_map[process] = json_process processes = [json_process] else: processes = [] process_idx_map = {} for process in self.processes: if level_processes == LVL_PROC_THREAD or not process.thread: process_idx_map[process] = len(processes) json_process = process.json(process_idx_map) prog_map[process] = json_process processes.append(json_process) else: p_process = process while p_process.thread: p_process = p_process.parent prog_map[process] = prog_map[p_process] return {'name': self.name, 'processes': processes} class Process(object): """Structure representing a process in the experiment. """ _id_gen = 0 def __init__(self, pid, run, parent, timestamp, thread, acted, binary, argv, created): self.id = Process._id_gen Process._id_gen += 1 self.pid = pid self.run = run self.parent = parent self.timestamp = timestamp self.thread = bool(thread) # Whether that process has done something yet. If it execve()s and # hasn't done anything since it forked, no need for it to appear self.acted = acted # Executable file self.binary = binary # Command-line if this was created by an exec self.argv = argv # How was this process created, one of the C_* constants self.created = created def dot(self, fp, level_processes, indent=1): thread_style = ',fillcolor="#666666"' if self.thread else '' fp.write(' ' * indent + 'prog%d [label="%s (%d)"%s];\n' % ( self.id, escape(unicode_(self.binary) or "-"), self.pid, thread_style)) if self.parent is not None: reason = '' if self.created == C_FORK: if self.thread: reason = "thread" else: reason = "fork" elif self.created == C_EXEC: reason = "exec" elif self.created == C_FORKEXEC: reason = "fork+exec" fp.write(' ' * indent + 'prog%d -> prog%d [label="%s"];\n' % ( self.parent.id, self.id, reason)) def dot_endpoint(self, level_processes): if level_processes == LVL_PROC_RUN: return self.run.dot_endpoint(level_processes) else: prog = self if level_processes == LVL_PROC_PROCESS: while prog.thread: prog = prog.parent return 'prog%d' % prog.id def json(self, process_map): name = "%d" % self.pid long_name = "%s (%d)" % (PosixPath(self.binary).components[-1] if self.binary else "-", self.pid) description = "%s\n%d" % (self.binary, self.pid) if self.parent is not None: if self.created == C_FORK: reason = "fork" elif self.created == C_EXEC: reason = "exec" elif self.created == C_FORKEXEC: reason = "fork+exec" else: assert False parent = [process_map[self.parent], reason] else: parent = None return {'name': name, 'parent': parent, 'reads': [], 'writes': [], 'long_name': long_name, 'description': description, 'argv': self.argv, 'is_thread': self.thread, 'start_time': self.timestamp} class Package(object): """Structure representing a system package. """ def __init__(self, name, version=None): self.id = None self.name = name self.version = version self.files = set() def dot(self, fp, level_pkgs): assert self.id is not None if not self.files: return if level_pkgs == LVL_PKG_PACKAGE: fp.write(' "pkg %s" [shape=box,label=' % escape(self.name)) if self.version: fp.write('"%s %s"];\n' % ( escape(self.name), escape(self.version))) else: fp.write('"%s"];\n' % escape(self.name)) elif level_pkgs == LVL_PKG_FILE: fp.write(' subgraph cluster_pkg%d {\n label=' % self.id) if self.version: fp.write('"%s %s";\n' % ( escape(self.name), escape(self.version))) else: fp.write('"%s";\n' % escape(self.name)) for f in sorted(unicode_(f) for f in self.files): fp.write(' "%s";\n' % escape(f)) fp.write(' }\n') def dot_endpoint(self, f, level_pkgs): if level_pkgs == LVL_PKG_PACKAGE: return '"pkg %s"' % escape(self.name) else: return '"%s"' % escape(unicode_(f)) def json_endpoint(self, f, level_pkgs): if level_pkgs == LVL_PKG_PACKAGE: return self.name else: return unicode_(f) def json(self, level_pkgs): if level_pkgs == LVL_PKG_PACKAGE: logger.critical("JSON output doesn't support --packages package") sys.exit(1) elif level_pkgs == LVL_PKG_FILE: files = sorted(unicode_(f) for f in self.files) else: assert False return {'name': self.name, 'version': self.version or None, 'files': files} def parse_levels(level_pkgs, level_processes, level_other_files): try: level_pkgs = {'file': LVL_PKG_FILE, 'files': LVL_PKG_FILE, 'package': LVL_PKG_PACKAGE, 'packages': LVL_PKG_PACKAGE, 'ignore': LVL_PKG_IGNORE, 'drop': LVL_PKG_DROP}[level_pkgs] except KeyError: logger.critical("Unknown level of detail for packages: '%s'", level_pkgs) sys.exit(1) try: level_processes = {'thread': LVL_PROC_THREAD, 'threads': LVL_PROC_THREAD, 'process': LVL_PROC_PROCESS, 'processes': LVL_PROC_PROCESS, 'run': LVL_PROC_RUN, 'runs': LVL_PROC_RUN}[level_processes] except KeyError: logger.critical("Unknown level of detail for processes: '%s'", level_processes) sys.exit(1) if level_other_files.startswith('depth:'): file_depth = int(level_other_files[6:]) level_other_files = 'all' else: file_depth = None try: level_other_files = {'all': LVL_OTHER_ALL, 'io': LVL_OTHER_IO, 'inputoutput': LVL_OTHER_IO, 'no': LVL_OTHER_NO, 'none': LVL_OTHER_NO, 'drop': LVL_OTHER_NO}[level_other_files] except KeyError: logger.critical("Unknown level of detail for other files: '%s'", level_other_files) sys.exit(1) return level_pkgs, level_processes, level_other_files, file_depth def read_events(database, all_forks, has_thread_flag): # In here, a file is any file on the filesystem. A binary is a file, that # gets executed. A process is a system-level task, identified by its pid # (pids don't get reused in the database). # What I call program is the couple (process, binary), so forking creates a # new program (with the same binary) and exec'ing creates a new program as # well (with the same process) # Because of this, fork+exec will create an intermediate program that # doesn't do anything (new process but still old binary). If that program # doesn't do anything worth showing on the graph, it will be erased, unless # all_forks is True (--all-forks). assert database.is_file() if PY3: # On PY3, connect() only accepts unicode conn = sqlite3.connect(str(database)) else: conn = sqlite3.connect(database.path) conn.row_factory = sqlite3.Row # This is a bit weird. We need to iterate on all types of events at the # same time, ordering by timestamp, so we decorate-sort-undecorate # Decoration adds timestamp (for sorting) and tags by event type, one of # 'process', 'open' or 'exec' # Reads processes from the database process_cursor = conn.cursor() if has_thread_flag: sql = ''' SELECT id, parent, timestamp, is_thread FROM processes ORDER BY id ''' else: sql = ''' SELECT id, parent, timestamp, 0 as is_thread FROM processes ORDER BY id ''' process_rows = process_cursor.execute(sql) processes = {} all_programs = [] # ... and opened files... file_cursor = conn.cursor() file_rows = file_cursor.execute( ''' SELECT name, timestamp, mode, process, is_directory FROM opened_files ORDER BY id ''') binaries = set() files = set() edges = OrderedSet() # ... as well as executed files. exec_cursor = conn.cursor() exec_rows = exec_cursor.execute( ''' SELECT name, timestamp, process, argv FROM executed_files ORDER BY id ''') # Loop on all event lists logger.info("Getting all events from database...") rows = heapq.merge(((r[2], 'process', r) for r in process_rows), ((r[1], 'open', r) for r in file_rows), ((r[1], 'exec', r) for r in exec_rows)) runs = [] run = None for ts, event_type, data in rows: if event_type == 'process': r_id, r_parent, r_timestamp, r_thread = data logger.debug("Process %d created (parent %r)", r_id, r_parent) if r_parent is not None: parent = processes[r_parent] binary = parent.binary else: run = Run(len(runs)) runs.append(run) parent = None binary = None if r_parent is not None: argv = processes[r_parent].argv else: argv = None process = Process(r_id, run, parent, r_timestamp, r_thread, False, binary, argv, C_INITIAL if r_parent is None else C_FORK) processes[r_id] = process all_programs.append(process) run.processes.append(process) elif event_type == 'open': r_name, r_timestamp, r_mode, r_process, r_directory = data r_name = normalize_path(r_name) logger.debug("File open: %s, process %d", r_name, r_process) if not (r_mode & FILE_WDIR or r_directory): process = processes[r_process] files.add(r_name) edges.add((process, r_name, r_mode, None)) elif event_type == 'exec': r_name, r_timestamp, r_process, r_argv = data r_name = normalize_path(r_name) argv = tuple(r_argv.split('\0')) if not argv[-1]: argv = argv[:-1] logger.debug("File exec: %s, process %d", r_name, r_process) process = processes[r_process] binaries.add(r_name) # Here we split this process in two "programs", unless the previous # one hasn't done anything since it was created via fork() if not all_forks and not process.acted: process.binary = r_name process.created = C_FORKEXEC process.acted = True process.argv = argv else: process = Process(process.pid, run, process, r_timestamp, False, True, # Hides exec only once r_name, argv, C_EXEC) all_programs.append(process) processes[r_process] = process run.processes.append(process) files.add(r_name) edges.add((process, r_name, None, argv)) process_cursor.close() file_cursor.close() exec_cursor.close() conn.close() return runs, files, edges def format_argv(argv): joined = ' '.join(argv) if len(joined) < 50: return joined else: return "%s ..." % argv[0] def generate(target, configfile, database, all_forks=False, graph_format='dot', level_pkgs='file', level_processes='thread', level_other_files='all', regex_filters=None, regex_includes=None, regex_replaces=None, aggregates=None): """Main function for the graph subcommand. """ try: graph_format = {'dot': FORMAT_DOT, 'DOT': FORMAT_DOT, 'json': FORMAT_JSON, 'JSON': FORMAT_JSON}[graph_format] except KeyError: logger.critical("Unknown output format %r", graph_format) sys.exit(1) level_pkgs, level_processes, level_other_files, file_depth = \ parse_levels(level_pkgs, level_processes, level_other_files) if target.exists(): logger.critical("Output file %s exists", target) sys.exit(1) # Reads package ownership from the configuration if not configfile.is_file(): logger.critical("Configuration file does not exist!\n" "Did you forget to run 'reprozip trace'?\n" "If not, you might want to use --dir to specify an " "alternate location.") sys.exit(1) config = load_config(configfile, canonical=False) inputs_outputs = config.inputs_outputs inputs_outputs_map = dict((f.path, n) for n, f in iteritems(config.inputs_outputs)) has_thread_flag = config.format_version >= LooseVersion('0.7') runs, files, edges = read_events(database, all_forks, has_thread_flag) # Label the runs if len(runs) != len(config.runs): logger.warning("Configuration file doesn't list the same number of " "runs we found in the database!") else: for config_run, run in izip(config.runs, runs): run.name = config_run['id'] # Apply regexes ignore = [lambda path, r=re.compile(p): r.search(path) is not None for p in regex_filters or []] include = [lambda path, r=re.compile(p): r.search(path) is not None for p in regex_includes or []] replace = [lambda path, r=re.compile(p): r.sub(repl, path) for p, repl in regex_replaces or []] def filefilter(path): pathuni = unicode_(path) if include and not any(f(pathuni) for f in include): logger.debug("IGN(include) %s", pathuni) return None if any(f(pathuni) for f in ignore): logger.debug("IGN %s", pathuni) return None if not (replace or aggregates): return path for fi in replace: pathuni_ = fi(pathuni) if pathuni_ != pathuni: logger.debug("SUB %s -> %s", pathuni, pathuni_) pathuni = pathuni_ for prefix in aggregates or []: if pathuni.startswith(prefix): logger.debug("AGG %s -> %s", pathuni, prefix) pathuni = prefix break return PosixPath(pathuni) files_new = set() for fi in files: fi = filefilter(fi) if fi is not None: files_new.add(fi) files = files_new edges_new = OrderedSet() for prog, fi, mode, argv in edges: fi = filefilter(fi) if fi is not None: edges_new.add((prog, fi, mode, argv)) edges = edges_new # Puts files in packages package_map = {} if level_pkgs == LVL_PKG_IGNORE: packages = [] other_files = files else: logger.info("Organizes packages...") file2package = dict((f.path, pkg) for pkg in config.packages for f in pkg.files) packages = {} other_files = [] for fi in files: pkg = file2package.get(fi) if pkg is not None: package = packages.get(pkg.name) if package is None: package = Package(pkg.name, pkg.version) packages[pkg.name] = package package.files.add(fi) package_map[fi] = package else: other_files.append(fi) packages = sorted(itervalues(packages), key=lambda pkg: pkg.name) for i, pkg in enumerate(packages): pkg.id = i # Filter other files if level_other_files == LVL_OTHER_ALL and file_depth is not None: other_files = set(PosixPath(*f.components[:file_depth + 1]) for f in other_files) edges = OrderedSet((prog, f if f in package_map else PosixPath(*f.components[:file_depth + 1]), mode, argv) for prog, f, mode, argv in edges) else: if level_other_files == LVL_OTHER_IO: other_files = set(f for f in other_files if f in inputs_outputs_map) edges = [(prog, f, mode, argv) for prog, f, mode, argv in edges if f in package_map or f in other_files] elif level_other_files == LVL_OTHER_NO: other_files = set() edges = [(prog, f, mode, argv) for prog, f, mode, argv in edges if f in package_map] args = (target, runs, packages, other_files, package_map, edges, inputs_outputs, inputs_outputs_map, level_pkgs, level_processes, level_other_files) if graph_format == FORMAT_DOT: graph_dot(*args) elif graph_format == FORMAT_JSON: graph_json(*args) else: assert False def graph_dot(target, runs, packages, other_files, package_map, edges, inputs_outputs, inputs_outputs_map, level_pkgs, level_processes, level_other_files): """Writes a GraphViz DOT file from the collected information. """ with target.open('w', encoding='utf-8', newline='\n') as fp: fp.write('digraph G {\n rankdir=LR;\n\n /* programs */\n' ' node [shape=box fontcolor=white ' 'fillcolor=black style="filled,rounded"];\n') # Programs logger.info("Writing programs...") for run in runs: run.dot(fp, level_processes) fp.write('\n' ' node [shape=ellipse fontcolor="#131C39" ' 'fillcolor="#C9D2ED"];\n') # Packages if level_pkgs not in (LVL_PKG_IGNORE, LVL_PKG_DROP): logger.info("Writing packages...") fp.write('\n /* system packages */\n') for package in sorted(packages, key=lambda pkg: pkg.name): package.dot(fp, level_pkgs) fp.write('\n /* other files */\n') # Other files logger.info("Writing other files...") for fi in sorted(other_files): if fi in inputs_outputs_map: fp.write(' "%(path)s" [fillcolor="#A3B4E0", ' 'label="%(name)s\\n%(path)s"];\n' % {'path': escape(unicode_(fi)), 'name': inputs_outputs_map[fi]}) else: fp.write(' "%s";\n' % escape(unicode_(fi))) fp.write('\n') # Edges logger.info("Connecting edges...") done_edges = set() for prog, fi, mode, argv in edges: endp_prog = prog.dot_endpoint(level_processes) if fi in package_map: if level_pkgs == LVL_PKG_DROP: continue endp_file = package_map[fi].dot_endpoint(fi, level_pkgs) e = endp_prog, endp_file, mode if e in done_edges: continue else: done_edges.add(e) else: endp_file = '"%s"' % escape(unicode_(fi)) if mode is None: fp.write(' %s -> %s [style=bold, label="%s"];\n' % ( endp_file, endp_prog, escape(format_argv(argv)))) elif mode & FILE_WRITE: fp.write(' %s -> %s [color="#000088"];\n' % ( endp_prog, endp_file)) elif mode & FILE_READ: fp.write(' %s -> %s [color="#8888CC"];\n' % ( endp_file, endp_prog)) fp.write('}\n') def graph_json(target, runs, packages, other_files, package_map, edges, inputs_outputs, inputs_outputs_map, level_pkgs, level_processes, level_other_files): """Writes a JSON file suitable for further processing. """ # Packages if level_pkgs in (LVL_PKG_IGNORE, LVL_PKG_DROP): json_packages = [] else: json_packages = [pkg.json(level_pkgs) for pkg in packages] # Other files json_other_files = [unicode_(fi) for fi in sorted(other_files)] # Programs prog_map = {} json_runs = [run.json(prog_map, level_processes) for run in runs] # Connect edges done_edges = set() for prog, fi, mode, argv in edges: endp_prog = prog_map[prog] if fi in package_map: if level_pkgs == LVL_PKG_DROP: continue endp_file = package_map[fi].json_endpoint(fi, level_pkgs) e = endp_prog['name'], endp_file, mode if e in done_edges: continue else: done_edges.add(e) else: endp_file = unicode_(fi) if mode is None: endp_prog['reads'].append(endp_file) # TODO: argv? elif mode & FILE_WRITE: endp_prog['writes'].append(endp_file) elif mode & FILE_READ: endp_prog['reads'].append(endp_file) json_other_files.sort() if PY3: fp = target.open('w', encoding='utf-8', newline='\n') else: fp = target.open('wb') try: json.dump({'packages': sorted(json_packages, key=lambda p: p['name']), 'other_files': json_other_files, 'runs': json_runs, 'inputs_outputs': [ {'name': k, 'path': unicode_(v.path), 'read_by_runs': v.read_runs, 'written_by_runs': v.write_runs} for k, v in sorted(iteritems(inputs_outputs))]}, fp, ensure_ascii=False, indent=2, sort_keys=True) finally: fp.close() def graph(args): """graph subcommand. Reads in the trace sqlite3 database and writes out a graph in GraphViz DOT format or JSON. """ def call_generate(args, config, trace): generate(Path(args.target[0]), config, trace, args.all_forks, args.format, args.packages, args.processes, args.otherfiles, args.regex_filter, args.regex_include, args.regex_replace, args.aggregate) if args.pack is not None: rpz_pack = RPZPack(args.pack) with rpz_pack.with_config() as config: with rpz_pack.with_trace() as trace: call_generate(args, config, trace) else: call_generate(args, Path(args.dir) / 'config.yml', Path(args.dir) / 'trace.sqlite3') def disabled_bug13676(args): stderr.write("Error: your version of Python, %s, is not supported\n" "Versions before 2.7.3 are affected by bug 13676 and will " "not be able to read\nthe trace " "database\n" % sys.version.split(' ', 1)[0]) sys.exit(1) def setup(parser, **kwargs): """Generates a provenance graph from the trace data """ # http://bugs.python.org/issue13676 # This prevents repro(un)zip from reading argv and envp arrays from trace if sys.version_info < (2, 7, 3): parser.add_argument('rest_of_cmdline', nargs=argparse.REMAINDER, help=argparse.SUPPRESS) parser.set_defaults(func=disabled_bug13676) return {'test_compatibility': (COMPAT_NO, "Python >2.7.3 required")} parser.add_argument('target', nargs=1, help="Destination DOT file") parser.add_argument('-F', '--all-forks', action='store_true', help="Show forked processes before they exec") parser.add_argument('--packages', default='file', help="Level of detail for packages; 'file', " "'package', 'drop' or 'ignore' (default: 'file')") parser.add_argument('--processes', default='thread', help="Level of detail for processes; 'thread', " "'process' or 'run' (default: 'thread')") parser.add_argument('--otherfiles', default='all', help="Level of detail for non-package files; 'all', " "'io' or 'no' (default: 'all')") parser.add_argument('--aggregate', action='append', help="Aggregate all files under this path") parser.add_argument('--regex-include', action='append', help="Glob patterns of files to include (checked " "before --regex-filter)") parser.add_argument('--regex-filter', action='append', help="Glob patterns of files to ignore (checked " "after --regex-include)") parser.add_argument('--regex-replace', action='append', nargs=2, help="Apply regular expression replacement to files") parser.add_argument('--dot', action='store_const', dest='format', const='dot', default='dot', help="Set the output format to DOT (this is the " "default)") parser.add_argument('--json', action='store_const', dest='format', const='json', help="Set the output format to JSON") parser.add_argument( '-d', '--dir', default='.reprozip-trace', help="where the database and configuration file are stored (default: " "./.reprozip-trace)") parser.add_argument( 'pack', nargs=argparse.OPTIONAL, help="Pack to extract (defaults to reading from --dir)") parser.set_defaults(func=graph) return {'test_compatibility': COMPAT_OK} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/unpackers/provviewer.py0000664000175000017500000003146014505567114022172 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. """Prov Viewer exporter. This exports the trace data into a format suitable for the Prov Viewer tool (https://github.com/gems-uff/prov-viewer). See schema: https://git.io/provviewer-xsd """ from __future__ import division, print_function, unicode_literals import argparse import logging from distutils.version import LooseVersion from rpaths import Path import sqlite3 import sys from reprounzip.common import FILE_WRITE, RPZPack, load_config from reprounzip.unpackers.common import COMPAT_OK, COMPAT_NO, shell_escape from reprounzip.utils import PY3, iteritems, stderr logger = logging.getLogger('reprounzip.provviewer') def xml_escape(s): """Escapes for XML. """ return (("%s" % s).replace('&', '&').replace('"', '"') .replace('<', '≶').replace('>', '>')) def generate(target, configfile, database): """Go over the trace and generate the graph file. """ # Reads package ownership from the configuration if not configfile.is_file(): logger.critical("Configuration file does not exist!\n" "Did you forget to run 'reprozip trace'?\n" "If not, you might want to use --dir to specify an " "alternate location.") sys.exit(1) config = load_config(configfile, canonical=False) has_thread_flag = config.format_version >= LooseVersion('0.7') assert database.is_file() if PY3: # On PY3, connect() only accepts unicode conn = sqlite3.connect(str(database)) else: conn = sqlite3.connect(database.path) conn.row_factory = sqlite3.Row vertices = [] edges = [] # Create user entity, that initiates the runs vertices.append({'ID': 'user', 'type': 'Agent', 'subtype': 'User', 'label': 'User'}) run = -1 # Read processes cur = conn.cursor() rows = cur.execute( ''' SELECT id, parent, timestamp, is_thread, exitcode FROM processes; ''' if has_thread_flag else ''' SELECT id, parent, timestamp, 0 as is_thread, exitcode FROM processes; ''') for r_id, r_parent, r_timestamp, r_isthread, r_exitcode in rows: if r_parent is None: # Create run entity run += 1 vertices.append({'ID': 'run%d' % run, 'type': 'Activity', 'subtype': 'Run', 'label': "Run #%d" % run, 'date': r_timestamp}) # User -> run edges.append({'ID': 'user_run%d' % run, 'type': 'UserRuns', 'label': "User runs command", 'sourceID': 'user', 'targetID': 'run%d' % run}) # Run -> process edges.append({'ID': 'run_start%d' % run, 'type': 'RunStarts', 'label': "Run #%d command", 'sourceID': 'run%d' % run, 'targetID': 'process%d' % r_id}) # Create process entity vertices.append({'ID': 'process%d' % r_id, 'type': 'Agent', 'subtype': 'Thread' if r_isthread else 'Process', 'label': 'Process #%d' % r_id, 'date': r_timestamp}) # TODO: add process end time (use master branch?) # Add process creation activity if r_parent is not None: # Process creation activity vertex = {'ID': 'fork%d' % r_id, 'type': 'Activity', 'subtype': 'Fork', 'label': "#%d creates %s #%d" % ( r_parent, "thread" if r_isthread else "process", r_id), 'date': r_timestamp} if has_thread_flag: vertex['thread'] = 'true' if r_isthread else 'false' vertices.append(vertex) # Parent -> creation edges.append({'ID': 'fork_p_%d' % r_id, 'type': 'PerformsFork', 'label': "Performs fork", 'sourceID': 'process%d' % r_parent, 'targetID': 'fork%d' % r_id}) # Creation -> child edges.append({'ID': 'fork_c_%d' % r_id, 'type': 'ForkCreates', 'label': "Fork creates", 'sourceID': 'fork%d' % r_id, 'targetID': 'process%d' % r_id}) cur.close() file2package = dict((f.path.path, pkg) for pkg in config.packages for f in pkg.files) inputs_outputs = dict((f.path.path, (bool(f.write_runs), bool(f.read_runs))) for n, f in iteritems(config.inputs_outputs)) # Read opened files cur = conn.cursor() rows = cur.execute( ''' SELECT name, is_directory FROM opened_files GROUP BY name; ''') for r_name, r_directory in rows: # Create file entity vertex = {'ID': r_name, 'type': 'Entity', 'subtype': 'Directory' if r_directory else 'File', 'label': r_name} if r_name in file2package: vertex['package'] = file2package[r_name].name if r_name in inputs_outputs: out_, in_ = inputs_outputs[r_name] if in_: vertex['input'] = True if out_: vertex['output'] = True vertices.append(vertex) cur.close() # Read file opens cur = conn.cursor() rows = cur.execute( ''' SELECT id, name, timestamp, mode, process FROM opened_files; ''') for r_id, r_name, r_timestamp, r_mode, r_process in rows: # Create file access activity vertices.append({'ID': 'access%d' % r_id, 'type': 'Activity', 'subtype': ('FileWrites' if r_mode & FILE_WRITE else 'FileReads'), 'label': ("File write: %s" if r_mode & FILE_WRITE else "File read: %s") % r_name, 'date': r_timestamp, 'mode': r_mode}) # Process -> access edges.append({'ID': 'proc_access%d' % r_id, 'type': 'PerformsFileAccess', 'label': "Process does file access", 'sourceID': 'process%d' % r_process, 'targetID': 'access%d' % r_id}) # Access -> file edges.append({'ID': 'access_file%d' % r_id, 'type': 'AccessFile', 'label': "File access touches", 'sourceID': 'access%d' % r_id, 'targetID': r_name}) cur.close() # Read executions cur = conn.cursor() rows = cur.execute( ''' SELECT id, name, timestamp, process, argv FROM executed_files; ''') for r_id, r_name, r_timestamp, r_process, r_argv in rows: argv = r_argv.split('\0') if not argv[-1]: argv = argv[:-1] cmdline = ' '.join(shell_escape(a) for a in argv) # Create execution activity vertices.append({'ID': 'exec%d' % r_id, 'type': 'Activity', 'subtype': 'ProcessExecutes', 'label': "Process #%d executes file %s" % (r_process, r_name), 'date': r_timestamp, 'cmdline': cmdline, 'process': r_process, 'file': r_name}) # Process -> execution edges.append({'ID': 'proc_exec%d' % r_id, 'type': 'ProcessExecution', 'label': "Process does exec()", 'sourceID': 'process%d' % r_process, 'targetID': 'exec%d' % r_id}) # Execution -> file edges.append({'ID': 'exec_file%d' % r_id, 'type': 'ExecutionFile', 'label': "Execute file", 'sourceID': 'exec%d' % r_id, 'targetID': r_name}) cur.close() # Write the file from the created lists with target.open('w', encoding='utf-8', newline='\n') as out: out.write('\n\n' '\n' ' \n') for vertex in vertices: if 'date' not in vertex: vertex['date'] = '-1' tags = {} for k in ('ID', 'type', 'label', 'date'): if k not in vertex: vertex.update(tags) raise ValueError("Vertex is missing tag '%s': %r" % ( k, vertex)) tags[k] = vertex.pop(k) out.write(' \n ' + '\n '.join('<{k}>{v}'.format(k=k, v=xml_escape(v)) for k, v in iteritems(tags))) if vertex: out.write('\n \n') for k, v in iteritems(vertex): out.write(' \n' ' {k}\n' ' {v}\n' ' \n' .format(k=xml_escape(k), v=xml_escape(v))) out.write(' ') out.write('\n \n') out.write(' \n' ' \n') for edge in edges: for k in ('ID', 'type', 'label', 'sourceID', 'targetID'): if k not in edge: raise ValueError("Edge is missing tag '%s': %r" % ( k, edge)) if 'value' not in edge: edge['value'] = '' out.write(' \n ' + '\n '.join('<{k}>{v}'.format(k=k, v=xml_escape(v)) for k, v in iteritems(edge)) + '\n \n') out.write(' \n' '\n') conn.close() def provgraph(args): """provgraph subcommand. Reads in the trace sqlite3 database and writes out a graph in Provenance Viewer graph format.""" def call_generate(args, config, trace): generate(Path(args.target[0]), config, trace) if args.pack is not None: rpz_pack = RPZPack(args.pack) with rpz_pack.with_config() as config: with rpz_pack.with_trace() as trace: call_generate(args, config, trace) else: call_generate(args, Path(args.dir) / 'config.yml', Path(args.dir) / 'trace.sqlite3') def disabled_bug13676(args): stderr.write("Error: your version of Python, %s, is not supported\n" "Versions before 2.7.3 are affected by bug 13676 and will " "not be able to read\nthe trace " "database\n" % sys.version.split(' ', 1)[0]) sys.exit(1) def setup(parser, **kwargs): """Generates a Prov Viewer graph from the trace data """ # http://bugs.python.org/issue13676 # This prevents repro(un)zip from reading argv and envp arrays from trace if sys.version_info < (2, 7, 3): parser.add_argument('rest_of_cmdline', nargs=argparse.REMAINDER, help=argparse.SUPPRESS) parser.set_defaults(func=disabled_bug13676) return {'test_compatibility': (COMPAT_NO, "Python >2.7.3 required")} parser.add_argument('target', nargs=1, help="Destination DOT file") parser.add_argument( '-d', '--dir', default='.reprozip-trace', help="where the database and configuration file are stored (default: " "./.reprozip-trace)") parser.add_argument( 'pack', nargs=argparse.OPTIONAL, help="Pack to extract (defaults to reading from --dir)") parser.set_defaults(func=provgraph) return {'test_compatibility': COMPAT_OK} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1696001612.0 reprounzip-1.3/reprounzip/utils.py0000664000175000017500000003315014505567114017125 0ustar00remramremram# Copyright (C) 2014 New York University # This file is part of ReproZip which is released under the Revised BSD License # See file LICENSE for full license details. # This file is shared: # reprozip/reprozip/utils.py # reprounzip/reprounzip/utils.py """Utility functions. These functions are shared between reprozip and reprounzip but are not specific to this software (more utilities). """ from __future__ import division, print_function, unicode_literals import codecs import contextlib from datetime import datetime import email.utils import itertools import locale import logging import operator import os import requests from rpaths import Path, PosixPath import stat import subprocess import sys import time logger = logging.getLogger(__name__.split('.', 1)[0]) class StreamWriter(object): def __init__(self, stream): writer = codecs.getwriter(locale.getpreferredencoding()) self._writer = writer(stream, 'replace') self.buffer = stream def writelines(self, lines): self.write(str('').join(lines)) def write(self, obj): if isinstance(obj, bytes): self.buffer.write(obj) else: self._writer.write(obj) def __getattr__(self, name, getattr=getattr): """ Inherit all other methods from the underlying stream. """ return getattr(self._writer, name) PY3 = sys.version_info[0] == 3 if PY3: izip = zip irange = range iteritems = lambda d: d.items() itervalues = lambda d: d.values() listvalues = lambda d: list(d.values()) stdout_bytes = sys.stdout.buffer if sys.stdout is not None else None stderr_bytes = sys.stderr.buffer if sys.stderr is not None else None stdin_bytes = sys.stdin.buffer if sys.stdin is not None else None stdout, stderr = sys.stdout, sys.stderr else: izip = itertools.izip irange = xrange # noqa: F821 iteritems = lambda d: d.iteritems() itervalues = lambda d: d.itervalues() listvalues = lambda d: d.values() _writer = codecs.getwriter(locale.getpreferredencoding()) stdout_bytes, stderr_bytes = sys.stdout, sys.stderr stdin_bytes = sys.stdin stdout, stderr = StreamWriter(sys.stdout), StreamWriter(sys.stderr) if PY3: int_types = int, unicode_ = str else: int_types = int, long # noqa: F821 unicode_ = unicode # noqa: F821 def flatten(n, iterable): """Flattens an iterable by repeatedly calling chain.from_iterable() on it. >>> a = [[1, 2, 3], [4, 5, 6]] >>> b = [[7, 8], [9, 10, 11, 12, 13, 14, 15, 16]] >>> l = [a, b] >>> list(flatten(0, a)) [[1, 2, 3], [4, 5, 6]] >>> list(flatten(1, a)) [1, 2, 3, 4, 5, 6] >>> list(flatten(1, l)) [[1, 2, 3], [4, 5, 6], [7, 8], [9, 10, 11, 12, 13, 14, 15, 16]] >>> list(flatten(2, l)) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] """ for _ in irange(n): iterable = itertools.chain.from_iterable(iterable) return iterable class UniqueNames(object): """Makes names unique amongst the ones it's already seen. """ def __init__(self): self.names = set() def insert(self, name): assert name not in self.names self.names.add(name) def __call__(self, name): nb = 1 attempt = name while attempt in self.names: nb += 1 attempt = '%s_%d' % (name, nb) self.names.add(attempt) return attempt def escape(s): """Escapes backslashes and double quotes in strings. This does NOT add quotes around the string. """ return s.replace('\\', '\\\\').replace('"', '\\"') def optional_return_type(req_args, other_args): """Sort of namedtuple but with name-only fields. When deconstructing a namedtuple, you have to get all the fields: >>> o = namedtuple('T', ['a', 'b', 'c'])(1, 2, 3) >>> a, b = o ValueError: too many values to unpack You thus cannot easily add new return values. This class allows it: >>> o2 = optional_return_type(['a', 'b'], ['c'])(1, 2, 3) >>> a, b = o2 >>> c = o2.c """ if len(set(req_args) | set(other_args)) != len(req_args) + len(other_args): raise ValueError # Maps argument name to position in each list req_args_pos = dict((n, i) for i, n in enumerate(req_args)) other_args_pos = dict((n, i) for i, n in enumerate(other_args)) def cstr(cls, *args, **kwargs): if len(args) > len(req_args) + len(other_args): raise TypeError( "Too many arguments (expected at least %d and no more than " "%d)" % (len(req_args), len(req_args) + len(other_args))) args1, args2 = args[:len(req_args)], args[len(req_args):] req = dict((i, v) for i, v in enumerate(args1)) other = dict(izip(other_args, args2)) for k, v in iteritems(kwargs): if k in req_args_pos: pos = req_args_pos[k] if pos in req: raise TypeError("Multiple values for field %s" % k) req[pos] = v elif k in other_args_pos: if k in other: raise TypeError("Multiple values for field %s" % k) other[k] = v else: raise TypeError("Unknown field name %s" % k) args = [] for i, k in enumerate(req_args): if i not in req: raise TypeError("Missing value for field %s" % k) args.append(req[i]) inst = tuple.__new__(cls, args) inst.__dict__.update(other) return inst dct = {'__new__': cstr} for i, n in enumerate(req_args): dct[n] = property(operator.itemgetter(i)) return type(str('OptionalReturnType'), (tuple,), dct) def tz_offset(): offset = time.timezone if time.localtime().tm_isdst == 0 else time.altzone return -offset def isodatetime(): offset = tz_offset() sign = '+' if offset < 0: sign = '-' offset = -offset if offset % 60 == 0: offset = '%02d:%02d' % (offset // 3600, (offset // 60) % 60) else: offset = '%02d:%02d:%02d' % (offset // 3600, (offset // 60) % 60, offset % 60) # Remove microsecond now = datetime.now() now = datetime(year=now.year, month=now.month, day=now.day, hour=now.hour, minute=now.minute, second=now.second) return '%s%s%s' % (now.isoformat(), sign, offset) def hsize(nbytes): """Readable size. """ if nbytes is None: return "unknown" KB = 1 << 10 MB = 1 << 20 GB = 1 << 30 TB = 1 << 40 PB = 1 << 50 nbytes = float(nbytes) if nbytes < KB: return "{0} bytes".format(nbytes) elif nbytes < MB: return "{0:.2f} KB".format(nbytes / KB) elif nbytes < GB: return "{0:.2f} MB".format(nbytes / MB) elif nbytes < TB: return "{0:.2f} GB".format(nbytes / GB) elif nbytes < PB: return "{0:.2f} TB".format(nbytes / TB) else: return "{0:.2f} PB".format(nbytes / PB) def normalize_path(path): """Normalize a path obtained from the database. """ # For some reason, os.path.normpath() keeps multiple leading slashes # We don't want this since it has no meaning on Linux path = PosixPath(path) if path.path.startswith(path._sep + path._sep): path = PosixPath(path.path[1:]) return path def find_all_links_recursive(filename, files): path = Path('/') for c in filename.components[1:]: # At this point, path is a canonical path, and all links in it have # been resolved # We add the next path component path = path / c # That component is possibly a link if path.is_link(): # Adds the link itself files.add(path) target = path.read_link(absolute=True) # Here, target might contain a number of symlinks if target not in files: # Recurse on this new path find_all_links_recursive(target, files) # Restores the invariant; realpath might resolve several links here path = path.resolve() return path def find_all_links(filename, include_target=False): """Dereferences symlinks from a path. If include_target is True, this also returns the real path of the final target. Example: / a -> b b g -> c c -> ../a/d d e -> /f f >>> find_all_links('/a/g/e', True) ['/a', '/b/c', '/b/g', '/b/d/e', '/f'] """ files = set() filename = Path(filename) assert filename.absolute() path = find_all_links_recursive(filename, files) files = list(files) if include_target: files.append(path) return files def join_root(root, path): """Prepends `root` to the absolute path `path`. """ p_root, p_loc = path.split_root() assert p_root == b'/' return root / p_loc @contextlib.contextmanager def make_dir_writable(directory): """Context-manager that sets write permission on a directory. This assumes that the directory belongs to you. If the u+w permission wasn't set, it gets set in the context, and restored to what it was when leaving the context. u+x also gets set on all the directories leading to that path. """ uid = os.getuid() try: sb = directory.stat() except OSError: pass else: if sb.st_uid != uid or sb.st_mode & 0o700 == 0o700: yield return # These are the permissions to be restored, in reverse order restore_perms = [] try: # Add u+x to all directories up to the target path = Path('/') for c in directory.components[1:-1]: path = path / c sb = path.stat() if sb.st_uid == uid and not sb.st_mode & 0o100: logger.debug("Temporarily setting u+x on %s", path) restore_perms.append((path, sb.st_mode)) path.chmod(sb.st_mode | 0o700) # Add u+wx to the target sb = directory.stat() if sb.st_uid == uid and sb.st_mode & 0o700 != 0o700: logger.debug("Temporarily setting u+wx on %s", directory) restore_perms.append((directory, sb.st_mode)) directory.chmod(sb.st_mode | 0o700) yield finally: for path, mod in reversed(restore_perms): path.chmod(mod) def rmtree_fixed(path): """Like :func:`shutil.rmtree` but doesn't choke on annoying permissions. If a directory with -w or -x is encountered, it gets fixed and deletion continues. """ if path.is_link(): raise OSError("Cannot call rmtree on a symbolic link") uid = os.getuid() st = path.lstat() if st.st_uid == uid and st.st_mode & 0o700 != 0o700: path.chmod(st.st_mode | 0o700) for entry in path.listdir(): if stat.S_ISDIR(entry.lstat().st_mode): rmtree_fixed(entry) else: entry.remove() path.rmdir() # Compatibility with ReproZip <= 1.0.3 check_output = subprocess.check_output def copyfile(source, destination, CHUNK_SIZE=4096): """Copies from one file object to another. """ while True: chunk = source.read(CHUNK_SIZE) if chunk: destination.write(chunk) if len(chunk) != CHUNK_SIZE: break def download_file(url, dest, cachename=None, ssl_verify=None): """Downloads a file using a local cache. If the file cannot be downloaded or if it wasn't modified, the cached version will be used instead. The cache lives in ``~/.cache/reprozip/``. """ if cachename is None: if dest is None: raise ValueError("One of 'dest' or 'cachename' must be specified") cachename = dest.components[-1] headers = {} if 'XDG_CACHE_HOME' in os.environ: cache = Path(os.environ['XDG_CACHE_HOME']) else: cache = Path('~/.cache').expand_user() cache = cache / 'reprozip' / cachename if cache.exists(): mtime = email.utils.formatdate(cache.mtime(), usegmt=True) headers['If-Modified-Since'] = mtime cache.parent.mkdir(parents=True) try: response = requests.get(url, headers=headers, timeout=2 if cache.exists() else 10, stream=True, verify=ssl_verify) response.raise_for_status() if response.status_code == 304: raise requests.HTTPError( '304 File is up to date, no data returned', response=response) except requests.RequestException as e: if cache.exists(): if e.response and e.response.status_code == 304: logger.info("Download %s: cache is up to date", cachename) else: logger.warning("Download %s: error downloading %s: %s", cachename, url, e) if dest is not None: cache.copy(dest) return dest else: return cache else: raise logger.info("Download %s: downloading %s", cachename, url) try: with cache.open('wb') as f: for chunk in response.iter_content(4096): f.write(chunk) response.close() except Exception as e: # pragma: no cover try: cache.remove() except OSError: pass raise e logger.info("Downloaded %s successfully", cachename) if dest is not None: cache.copy(dest) return dest else: return cache ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1701984661.4429889 reprounzip-1.3/reprounzip.egg-info/0000775000175000017500000000000014534434625017105 5ustar00remramremram././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1701984661.0 reprounzip-1.3/reprounzip.egg-info/PKG-INFO0000644000175000017500000000573214534434625020207 0ustar00remramremramMetadata-Version: 2.1 Name: reprounzip Version: 1.3 Summary: Linux tool enabling reproducible experiments (unpacker) Home-page: https://www.reprozip.org/ Author: Remi Rampin, Fernando Chirigati, Dennis Shasha, Juliana Freire Author-email: reprozip@nyu.edu Maintainer: Remi Rampin Maintainer-email: remi@rampin.org License: BSD-3-Clause Project-URL: Documentation, https://docs.reprozip.org/ Project-URL: Examples, https://examples.reprozip.org/ Project-URL: Source, https://github.com/VIDA-NYU/reprozip Project-URL: Bug Tracker, https://github.com/VIDA-NYU/reprozip/issues Project-URL: Chat, https://riot.im/app/#/room/#reprozip:matrix.org Project-URL: Changelog, https://github.com/VIDA-NYU/reprozip/blob/1.x/CHANGELOG.md Keywords: reprozip,reprounzip,reproducibility,provenance,vida,nyu Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: BSD License Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Topic :: Scientific/Engineering Classifier: Topic :: System :: Archiving License-File: LICENSE.txt Requires-Dist: PyYAML Requires-Dist: rpaths>=0.8 Requires-Dist: usagestats>=0.3 Requires-Dist: requests Requires-Dist: distro Requires-Dist: pyelftools Provides-Extra: all Requires-Dist: reprounzip-vagrant>=1.0; extra == "all" Requires-Dist: reprounzip-docker>=1.0; extra == "all" Requires-Dist: reprounzip-vistrails>=1.0; extra == "all" ReproZip project ================ `ReproZip `__ is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science. It tracks operating system calls and creates a bundle that contains all the binaries, files and dependencies required to run a given command on the author's computational environment (packing step). A reviewer can then extract the experiment in his environment to reproduce the results (unpacking step). reprounzip ---------- This is the component responsible for the unpacking step on Linux distributions. Please refer to `reprozip `__, `reprounzip-vagrant `_, and `reprounzip-docker `_ for other components and plugins. A GUI is available at `reprounzip-qt `_. Additional Information ---------------------- For more detailed information, please refer to our `website `_, as well as to our `documentation `_. ReproZip is currently being developed at `NYU `_. The team includes: * `Fernando Chirigati `_ * `Juliana Freire `_ * `Remi Rampin `_ * `Dennis Shasha `_ * `Vicky Rampin `_ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1701984661.0 reprounzip-1.3/reprounzip.egg-info/SOURCES.txt0000644000175000017500000000143114534434625020766 0ustar00remramremramLICENSE.txt MANIFEST.in README.rst setup.cfg setup.py reprounzip/__init__.py reprounzip/common.py reprounzip/main.py reprounzip/orderedset.py reprounzip/pack_info.py reprounzip/parameters.py reprounzip/signals.py reprounzip/utils.py reprounzip.egg-info/PKG-INFO reprounzip.egg-info/SOURCES.txt reprounzip.egg-info/dependency_links.txt reprounzip.egg-info/entry_points.txt reprounzip.egg-info/namespace_packages.txt reprounzip.egg-info/requires.txt reprounzip.egg-info/top_level.txt reprounzip/plugins/__init__.py reprounzip/unpackers/__init__.py reprounzip/unpackers/default.py reprounzip/unpackers/graph.py reprounzip/unpackers/provviewer.py reprounzip/unpackers/common/__init__.py reprounzip/unpackers/common/misc.py reprounzip/unpackers/common/packages.py reprounzip/unpackers/common/x11.py././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1701984661.0 reprounzip-1.3/reprounzip.egg-info/dependency_links.txt0000644000175000017500000000000114534434625023151 0ustar00remramremram ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1701984661.0 reprounzip-1.3/reprounzip.egg-info/entry_points.txt0000644000175000017500000000065114534434625022403 0ustar00remramremram[console_scripts] reprounzip = reprounzip.main:main [reprounzip.unpackers] chroot = reprounzip.unpackers.default:setup_chroot directory = reprounzip.unpackers.default:setup_directory graph = reprounzip.unpackers.graph:setup info = reprounzip.pack_info:setup_info installpkgs = reprounzip.unpackers.default:setup_installpkgs provviewer = reprounzip.unpackers.provviewer:setup showfiles = reprounzip.pack_info:setup_showfiles ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1701984661.0 reprounzip-1.3/reprounzip.egg-info/namespace_packages.txt0000664000175000017500000000004014534434625023432 0ustar00remramremramreprounzip reprounzip.unpackers ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1701984661.0 reprounzip-1.3/reprounzip.egg-info/requires.txt0000644000175000017500000000021614534434625021502 0ustar00remramremramPyYAML rpaths>=0.8 usagestats>=0.3 requests distro pyelftools [all] reprounzip-vagrant>=1.0 reprounzip-docker>=1.0 reprounzip-vistrails>=1.0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1701984661.0 reprounzip-1.3/reprounzip.egg-info/top_level.txt0000644000175000017500000000001314534434625021627 0ustar00remramremramreprounzip ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1701984661.446989 reprounzip-1.3/setup.cfg0000664000175000017500000000010314534434625015011 0ustar00remramremram[bdist_wheel] universal = 1 [egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1701984157.0 reprounzip-1.3/setup.py0000664000175000017500000000521514534433635014713 0ustar00remramremramimport io import os from setuptools import setup # pip workaround os.chdir(os.path.abspath(os.path.dirname(__file__))) # Need to specify encoding for PY3, which has the worst unicode handling ever with io.open('README.rst', encoding='utf-8') as fp: description = fp.read() req = [ 'PyYAML', 'rpaths>=0.8', 'usagestats>=0.3', 'requests', 'distro', 'pyelftools'] setup(name='reprounzip', version='1.3', packages=['reprounzip', 'reprounzip.unpackers', 'reprounzip.unpackers.common', 'reprounzip.plugins'], entry_points={ 'console_scripts': [ 'reprounzip = reprounzip.main:main'], 'reprounzip.unpackers': [ 'info = reprounzip.pack_info:setup_info', 'showfiles = reprounzip.pack_info:setup_showfiles', 'graph = reprounzip.unpackers.graph:setup', 'provviewer = reprounzip.unpackers.provviewer:setup', 'installpkgs = reprounzip.unpackers.default:setup_installpkgs', 'directory = reprounzip.unpackers.default:setup_directory', 'chroot = reprounzip.unpackers.default:setup_chroot']}, namespace_packages=['reprounzip', 'reprounzip.unpackers'], install_requires=req, extras_require={ 'all': ['reprounzip-vagrant>=1.0', 'reprounzip-docker>=1.0', 'reprounzip-vistrails>=1.0']}, description="Linux tool enabling reproducible experiments (unpacker)", author="Remi Rampin, Fernando Chirigati, Dennis Shasha, Juliana Freire", author_email='reprozip@nyu.edu', maintainer="Remi Rampin", maintainer_email='remi@rampin.org', url='https://www.reprozip.org/', project_urls={ 'Documentation': 'https://docs.reprozip.org/', 'Examples': 'https://examples.reprozip.org/', 'Source': 'https://github.com/VIDA-NYU/reprozip', 'Bug Tracker': 'https://github.com/VIDA-NYU/reprozip/issues', 'Chat': 'https://riot.im/app/#/room/#reprozip:matrix.org', 'Changelog': 'https://github.com/VIDA-NYU/reprozip/blob/1.x/CHANGELOG.md', }, long_description=description, license='BSD-3-Clause', keywords=['reprozip', 'reprounzip', 'reproducibility', 'provenance', 'vida', 'nyu'], classifiers=[ 'Development Status :: 5 - Production/Stable', 'Intended Audience :: Science/Research', 'License :: OSI Approved :: BSD License', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Topic :: Scientific/Engineering', 'Topic :: System :: Archiving'])