reprozip-1.0.10/ 0000755 0000000 0000000 00000000000 13130663117 013402 5 ustar root root 0000000 0000000 reprozip-1.0.10/LICENSE.txt 0000644 0000000 0000000 00000002746 13073250224 015233 0 ustar root root 0000000 0000000 Copyright (C) 2014-2017, New York University
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
reprozip-1.0.10/MANIFEST.in 0000644 0000000 0000000 00000000064 13017600314 015132 0 ustar root root 0000000 0000000 include README.rst
include LICENSE.txt
graft native
reprozip-1.0.10/PKG-INFO 0000644 0000000 0000000 00000004776 13130663117 014515 0 ustar root root 0000000 0000000 Metadata-Version: 1.1
Name: reprozip
Version: 1.0.10
Summary: Linux tool enabling reproducible experiments (packer)
Home-page: http://vida-nyu.github.io/reprozip/
Author: Remi Rampin
Author-email: remirampin@gmail.com
License: BSD-3-Clause
Description: ReproZip
========
`ReproZip `__ is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science. It tracks operating system calls and creates a package that contains all the binaries, files and dependencies required to run a given command on the author's computational environment (packing step). A reviewer can then extract the experiment in his environment to reproduce the results (unpacking step).
reprozip
--------
This is the component responsible for the packing step on Linux distributions.
Please refer to `reprounzip `_, `reprounzip-vagrant `_, and `reprounzip-docker `_ for other components and plugins.
Additional Information
----------------------
For more detailed information, please refer to our `website `_, as well as to our `documentation `_.
ReproZip is currently being developed at `NYU `_. The team includes:
* `Fernando Chirigati `_
* `Juliana Freire `_
* `Remi Rampin `_
* `Dennis Shasha `_
* `Vicky Steeves `_
Keywords: reprozip,reprounzip,reproducibility,provenance,vida,nyu
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: C
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: System :: Archiving
reprozip-1.0.10/README.rst 0000644 0000000 0000000 00000002655 13127777623 015117 0 ustar root root 0000000 0000000 ReproZip
========
`ReproZip `__ is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science. It tracks operating system calls and creates a package that contains all the binaries, files and dependencies required to run a given command on the author's computational environment (packing step). A reviewer can then extract the experiment in his environment to reproduce the results (unpacking step).
reprozip
--------
This is the component responsible for the packing step on Linux distributions.
Please refer to `reprounzip `_, `reprounzip-vagrant `_, and `reprounzip-docker `_ for other components and plugins.
Additional Information
----------------------
For more detailed information, please refer to our `website `_, as well as to our `documentation `_.
ReproZip is currently being developed at `NYU `_. The team includes:
* `Fernando Chirigati `_
* `Juliana Freire `_
* `Remi Rampin `_
* `Dennis Shasha `_
* `Vicky Steeves `_
reprozip-1.0.10/native/ 0000755 0000000 0000000 00000000000 13130663116 014667 5 ustar root root 0000000 0000000 reprozip-1.0.10/native/config.h 0000644 0000000 0000000 00000001107 13017600314 016277 0 ustar root root 0000000 0000000 #ifndef CONFIG_H
#define CONFIG_H
#define WORD_SIZE sizeof(int)
#if !defined(X86) && !defined(X86_64)
# if defined(__x86_64__) || defined(__x86_64)
# define X86_64
# elif defined(__i386__) || defined(__i386) || defined(_M_I86) || defined(_M_IX86)
# define I386
# else
# error Unrecognized architecture!
# endif
#endif
/* Static assertion trick */
#define STATIC_ASSERT(name, condition) \
enum { name = 1/(!!( \
condition \
)) }
STATIC_ASSERT(ASSERT_POINTER_FITS_IN_LONG_INT,
sizeof(long int) >= sizeof(void*));
#endif
reprozip-1.0.10/native/database.c 0000644 0000000 0000000 00000025714 13127722141 016610 0 ustar root root 0000000 0000000 #include
#include
#include
#include
#include
#include
#include
#include "database.h"
#include "log.h"
#define count(x) (sizeof((x))/sizeof(*(x)))
#define check(r) if((r) != SQLITE_OK) { goto sqlerror; }
static sqlite3_uint64 gettime(void)
{
sqlite3_uint64 timestamp;
struct timespec now;
if(clock_gettime(CLOCK_MONOTONIC, &now) == -1)
{
/* LCOV_EXCL_START : clock_gettime() is unlikely to fail */
log_critical(0, "getting time failed (clock_gettime): %s",
strerror(errno));
exit(1);
/* LCOV_EXCL_END */
}
timestamp = now.tv_sec;
timestamp *= 1000000000;
timestamp += now.tv_nsec;
return timestamp;
}
static sqlite3 *db;
static sqlite3_stmt *stmt_last_rowid;
static sqlite3_stmt *stmt_insert_process;
static sqlite3_stmt *stmt_set_exitcode;
static sqlite3_stmt *stmt_insert_file;
static sqlite3_stmt *stmt_insert_exec;
static int run_id = -1;
int db_init(const char *filename)
{
int tables_exist;
check(sqlite3_open(filename, &db));
log_debug(0, "database file opened: %s", filename);
check(sqlite3_exec(db, "BEGIN IMMEDIATE;", NULL, NULL, NULL))
{
int ret;
const char *sql = ""
"SELECT name FROM SQLITE_MASTER "
"WHERE type='table';";
sqlite3_stmt *stmt_get_tables;
unsigned int found = 0x00;
check(sqlite3_prepare_v2(db, sql, -1, &stmt_get_tables, NULL));
while((ret = sqlite3_step(stmt_get_tables)) == SQLITE_ROW)
{
const char *colname = (const char*)sqlite3_column_text(
stmt_get_tables, 0);
if(strcmp("processes", colname) == 0)
found |= 0x01;
else if(strcmp("opened_files", colname) == 0)
found |= 0x02;
else if(strcmp("executed_files", colname) == 0)
found |= 0x04;
else
goto wrongschema;
}
if(found == 0x00)
tables_exist = 0;
else if(found == 0x07)
tables_exist = 1;
else
{
wrongschema:
log_critical(0, "database schema is wrong");
return -1;
}
sqlite3_finalize(stmt_get_tables);
if(ret != SQLITE_DONE)
goto sqlerror;
}
if(!tables_exist)
{
const char *sql[] = {
"CREATE TABLE processes("
" id INTEGER NOT NULL PRIMARY KEY,"
" run_id INTEGER NOT NULL,"
" parent INTEGER,"
" timestamp INTEGER NOT NULL,"
" is_thread BOOLEAN NOT NULL,"
" exitcode INTEGER"
" );",
"CREATE INDEX proc_parent_idx ON processes(parent);",
"CREATE TABLE opened_files("
" id INTEGER NOT NULL PRIMARY KEY,"
" run_id INTEGER NOT NULL,"
" name TEXT NOT NULL,"
" timestamp INTEGER NOT NULL,"
" mode INTEGER NOT NULL,"
" is_directory BOOLEAN NOT NULL,"
" process INTEGER NOT NULL"
" );",
"CREATE INDEX open_proc_idx ON opened_files(process);",
"CREATE TABLE executed_files("
" id INTEGER NOT NULL PRIMARY KEY,"
" name TEXT NOT NULL,"
" run_id INTEGER NOT NULL,"
" timestamp INTEGER NOT NULL,"
" process INTEGER NOT NULL,"
" argv TEXT NOT NULL,"
" envp TEXT NOT NULL,"
" workingdir TEXT NOT NULL"
" );",
"CREATE INDEX exec_proc_idx ON executed_files(process);",
};
size_t i;
for(i = 0; i < count(sql); ++i)
check(sqlite3_exec(db, sql[i], NULL, NULL, NULL));
}
/* Get the first unused run_id */
{
sqlite3_stmt *stmt_get_run_id;
const char *sql = "SELECT max(run_id) + 1 FROM processes;";
check(sqlite3_prepare_v2(db, sql, -1, &stmt_get_run_id, NULL));
if(sqlite3_step(stmt_get_run_id) != SQLITE_ROW)
{
sqlite3_finalize(stmt_get_run_id);
goto sqlerror;
}
run_id = sqlite3_column_int(stmt_get_run_id, 0);
if(sqlite3_step(stmt_get_run_id) != SQLITE_DONE)
{
sqlite3_finalize(stmt_get_run_id);
goto sqlerror;
}
sqlite3_finalize(stmt_get_run_id);
}
log_debug(0, "This is run %d", run_id);
{
const char *sql = ""
"SELECT last_insert_rowid()";
check(sqlite3_prepare_v2(db, sql, -1, &stmt_last_rowid, NULL));
}
{
const char *sql = ""
"INSERT INTO processes(run_id, parent, timestamp, is_thread)"
"VALUES(?, ?, ?, ?)";
check(sqlite3_prepare_v2(db, sql, -1, &stmt_insert_process, NULL));
}
{
const char *sql = ""
"UPDATE processes SET exitcode=?"
"WHERE id=?";
check(sqlite3_prepare_v2(db, sql, -1, &stmt_set_exitcode, NULL));
}
{
const char *sql = ""
"INSERT INTO opened_files(run_id, name, timestamp, "
" mode, is_directory, process)"
"VALUES(?, ?, ?, ?, ?, ?)";
check(sqlite3_prepare_v2(db, sql, -1, &stmt_insert_file, NULL));
}
{
const char *sql = ""
"INSERT INTO executed_files(run_id, name, timestamp, process, "
" argv, envp, workingdir)"
"VALUES(?, ?, ?, ?, ?, ?, ?)";
check(sqlite3_prepare_v2(db, sql, -1, &stmt_insert_exec, NULL));
}
return 0;
sqlerror:
log_critical(0, "sqlite3 error creating database: %s", sqlite3_errmsg(db));
return -1;
}
int db_close(int rollback)
{
if(rollback)
{
check(sqlite3_exec(db, "ROLLBACK;", NULL, NULL, NULL));
}
else
{
check(sqlite3_exec(db, "COMMIT;", NULL, NULL, NULL));
}
log_debug(0, "database file closed%s", rollback?" (rolled back)":"");
check(sqlite3_finalize(stmt_last_rowid));
check(sqlite3_finalize(stmt_insert_process));
check(sqlite3_finalize(stmt_set_exitcode));
check(sqlite3_finalize(stmt_insert_file));
check(sqlite3_finalize(stmt_insert_exec));
check(sqlite3_close(db));
run_id = -1;
return 0;
sqlerror:
log_critical(0, "sqlite3 error on exit: %s", sqlite3_errmsg(db));
return -1;
}
#define DB_NO_PARENT ((unsigned int)-2)
int db_add_process(unsigned int *id, unsigned int parent_id,
const char *working_dir, int is_thread)
{
check(sqlite3_bind_int(stmt_insert_process, 1, run_id));
if(parent_id == DB_NO_PARENT)
{
check(sqlite3_bind_null(stmt_insert_process, 2));
}
else
{
check(sqlite3_bind_int(stmt_insert_process, 2, parent_id));
}
/* This assumes that we won't go over 2^32 seconds (~135 years) */
check(sqlite3_bind_int64(stmt_insert_process, 3, gettime()));
check(sqlite3_bind_int(stmt_insert_process, 4, is_thread?1:0));
if(sqlite3_step(stmt_insert_process) != SQLITE_DONE)
goto sqlerror;
sqlite3_reset(stmt_insert_process);
/* Get id */
if(sqlite3_step(stmt_last_rowid) != SQLITE_ROW)
goto sqlerror;
*id = sqlite3_column_int(stmt_last_rowid, 0);
if(sqlite3_step(stmt_last_rowid) != SQLITE_DONE)
goto sqlerror;
sqlite3_reset(stmt_last_rowid);
return db_add_file_open(*id, working_dir, FILE_WDIR, 1);
sqlerror:
/* LCOV_EXCL_START : Insertions shouldn't fail */
log_critical(0, "sqlite3 error inserting process: %s", sqlite3_errmsg(db));
return -1;
/* LCOV_EXCL_END */
}
int db_add_first_process(unsigned int *id, const char *working_dir)
{
return db_add_process(id, DB_NO_PARENT, working_dir, 0);
}
int db_add_exit(unsigned int id, int exitcode)
{
check(sqlite3_bind_int(stmt_set_exitcode, 1, exitcode));
check(sqlite3_bind_int(stmt_set_exitcode, 2, id));
if(sqlite3_step(stmt_set_exitcode) != SQLITE_DONE)
goto sqlerror;
sqlite3_reset(stmt_set_exitcode);
return 0;
sqlerror:
/* LCOV_EXCL_START : Insertions shouldn't fail */
log_critical(0, "sqlite3 error setting exitcode: %s", sqlite3_errmsg(db));
return -1;
/* LCOV_EXCL_END */
}
int db_add_file_open(unsigned int process, const char *name,
unsigned int mode, int is_dir)
{
check(sqlite3_bind_int(stmt_insert_file, 1, run_id));
check(sqlite3_bind_text(stmt_insert_file, 2, name, -1, SQLITE_TRANSIENT));
/* This assumes that we won't go over 2^32 seconds (~135 years) */
check(sqlite3_bind_int64(stmt_insert_file, 3, gettime()));
check(sqlite3_bind_int(stmt_insert_file, 4, mode));
check(sqlite3_bind_int(stmt_insert_file, 5, is_dir));
check(sqlite3_bind_int(stmt_insert_file, 6, process));
if(sqlite3_step(stmt_insert_file) != SQLITE_DONE)
goto sqlerror;
sqlite3_reset(stmt_insert_file);
return 0;
sqlerror:
/* LCOV_EXCL_START : Insertions shouldn't fail */
log_critical(0, "sqlite3 error inserting file: %s", sqlite3_errmsg(db));
return -1;
/* LCOV_EXCL_END */
}
static char *strarray2nulsep(const char *const *array, size_t *plen)
{
char *list;
size_t len = 0;
{
const char *const *a = array;
while(*a)
{
len += strlen(*a) + 1;
++a;
}
}
{
const char *const *a = array;
char *p;
p = list = malloc(len);
while(*a)
{
const char *s = *a;
while(*s)
*p++ = *s++;
*p++ = '\0';
++a;
}
}
*plen = len;
return list;
}
int db_add_exec(unsigned int process, const char *binary,
const char *const *argv, const char *const *envp,
const char *workingdir)
{
check(sqlite3_bind_int(stmt_insert_exec, 1, run_id));
check(sqlite3_bind_text(stmt_insert_exec, 2, binary,
-1, SQLITE_TRANSIENT));
/* This assumes that we won't go over 2^32 seconds (~135 years) */
check(sqlite3_bind_int64(stmt_insert_exec, 3, gettime()));
check(sqlite3_bind_int(stmt_insert_exec, 4, process));
{
size_t len;
char *arglist = strarray2nulsep(argv, &len);
check(sqlite3_bind_text(stmt_insert_exec, 5, arglist, len,
SQLITE_TRANSIENT));
free(arglist);
}
{
size_t len;
char *envlist = strarray2nulsep(envp, &len);
check(sqlite3_bind_text(stmt_insert_exec, 6, envlist, len,
SQLITE_TRANSIENT));
free(envlist);
}
check(sqlite3_bind_text(stmt_insert_exec, 7, workingdir,
-1, SQLITE_TRANSIENT));
if(sqlite3_step(stmt_insert_exec) != SQLITE_DONE)
goto sqlerror;
sqlite3_reset(stmt_insert_exec);
return 0;
sqlerror:
/* LCOV_EXCL_START : Insertions shouldn't fail */
log_critical(0, "sqlite3 error inserting exec: %s", sqlite3_errmsg(db));
return -1;
/* LCOV_EXCL_END */
}
reprozip-1.0.10/native/database.h 0000644 0000000 0000000 00000001627 13127722141 016612 0 ustar root root 0000000 0000000 #ifndef DATABASE_H
#define DATABASE_H
#define FILE_READ 0x01
#define FILE_WRITE 0x02
#define FILE_WDIR 0x04 /* File is used as a process's working dir */
#define FILE_STAT 0x08 /* File is stat()d (only metadata is read) */
#define FILE_LINK 0x10 /* The link itself is accessed, no dereference */
int db_init(const char *filename);
int db_close(int rollback);
int db_add_process(unsigned int *id, unsigned int parent_id,
const char *working_dir, int is_thread);
int db_add_exit(unsigned int id, int exitcode);
int db_add_first_process(unsigned int *id, const char *working_dir);
int db_add_file_open(unsigned int process,
const char *name, unsigned int mode,
int is_dir);
int db_add_exec(unsigned int process, const char *binary,
const char *const *argv, const char *const *envp,
const char *workingdir);
#endif
reprozip-1.0.10/native/log.c 0000644 0000000 0000000 00000003516 13040240747 015622 0 ustar root root 0000000 0000000 #include
#include
#include
#include
#include
#include
#include "log.h"
extern int trace_verbosity;
static FILE *logfile = NULL;
int log_open_file(const char *filename)
{
assert(logfile == NULL);
logfile = fopen(filename, "ab");
if(logfile == NULL)
{
log_critical(0, "couldn't open log file: %s", strerror(errno));
return -1;
}
return 0;
}
void log_close_file(void)
{
if(logfile != NULL)
{
fclose(logfile);
logfile = NULL;
}
}
void log_real_(pid_t tid, const char *tag, int lvl, const char *format, ...)
{
va_list args;
char datestr[13]; /* HH:MM:SS.mmm */
static char *buffer = NULL;
static size_t bufsize = 4096;
int length;
if(buffer == NULL)
buffer = malloc(bufsize);
{
struct timeval tv;
gettimeofday(&tv, NULL);
strftime(datestr, 13, "%H:%M:%S", localtime(&tv.tv_sec));
sprintf(datestr+8, ".%03u", (unsigned int)(tv.tv_usec / 1000));
}
va_start(args, format);
length = vsnprintf(buffer, bufsize, format, args);
va_end(args);
if(length >= bufsize)
{
while(length >= bufsize)
bufsize *= 2;
free(buffer);
buffer = malloc(bufsize);
va_start(args, format);
length = vsnprintf(buffer, bufsize, format, args);
va_end(args);
}
if(trace_verbosity >= lvl)
{
fprintf(stderr, "[REPROZIP] %s %s: ", datestr, tag);
if(tid > 0)
fprintf(stderr, "[%d] ", tid);
fwrite(buffer, length, 1, stderr);
}
if(logfile && lvl <= 2)
{
fprintf(logfile, "[REPROZIP] %s %s: ", datestr, tag);
if(tid > 0)
fprintf(logfile, "[%d] ", tid);
fwrite(buffer, length, 1, logfile);
fflush(logfile);
}
}
reprozip-1.0.10/native/log.h 0000644 0000000 0000000 00000003252 13017600314 015616 0 ustar root root 0000000 0000000 #ifndef LOG_H
#define LOG_H
#include
#include
#include
#include
int log_open_file(const char *filename);
void log_close_file(void);
void log_real_(pid_t tid, const char *tag, int lvl, const char *format, ...);
#ifdef __GNUC__
#define log_critical(i, s, ...) log_critical_(i, s "\n", ## __VA_ARGS__)
#define log_error(i, s, ...) log_critical_(i, s "\n", ## __VA_ARGS__)
#define log_warn(i, s, ...) log_warn_(i, s "\n", ## __VA_ARGS__)
#define log_info(i, s, ...) log_info_(i, s "\n", ## __VA_ARGS__)
#define log_debug(i, s, ...) log_debug_(i, s "\n", ## __VA_ARGS__)
#define log_critical_(i, s, ...) log_real_(i, "CRITICAL", 0, s, ## __VA_ARGS__)
#define log_error_(i, s, ...) log_real_(i, "ERROR", 0, s, ## __VA_ARGS__)
#define log_warn_(i, s, ...) log_real_(i, "WARNING", 1, s, ## __VA_ARGS__)
#define log_info_(i, s, ...) log_real_(i, "INFO", 2, s, ## __VA_ARGS__)
#define log_debug_(i, s, ...) log_real_(i, "DEBUG", 3, s, ## __VA_ARGS__)
#else
#define log_critical(i, s, ...) log_critical_(i, s "\n", __VA_ARGS__)
#define log_error(i, s, ...) log_critical_(i, s "\n", __VA_ARGS__)
#define log_warn(i, s, ...) log_warn_(i, s "\n", __VA_ARGS__)
#define log_info(i, s, ...) log_info_(i, s "\n", __VA_ARGS__)
#define log_debug(i, s, ...) log_debug_(i, s "\n", __VA_ARGS__)
#define log_critical_(i, s, ...) log_real_(i, "CRITICAL", 0, s, __VA_ARGS__)
#define log_error_(i, s, ...) log_real_(i, "ERROR", 0, s, __VA_ARGS__)
#define log_warn_(i, s, ...) log_real_(i, "WARNING", 1, s, __VA_ARGS__)
#define log_info_(i, s, ...) log_real_(i, "INFO", 2, s, __VA_ARGS__)
#define log_debug_(i, s, ...) log_real_(i, "DEBUG", 3, s, __VA_ARGS__)
#endif
#endif
reprozip-1.0.10/native/ptrace_utils.c 0000644 0000000 0000000 00000010031 13017600314 017517 0 ustar root root 0000000 0000000 #include
#include
#include
#include
#include
#include
#include
#include "config.h"
#include "log.h"
#include "ptrace_utils.h"
#include "tracer.h"
static long tracee_getword(pid_t tid, const void *addr)
{
long res;
errno = 0;
res = ptrace(PTRACE_PEEKDATA, tid, addr, NULL);
if(errno)
{
/* LCOV_EXCL_START : We only do that on things that went through the
* kernel successfully, and so should be valid. The exception is
* execve(), which will dup arguments when entering the syscall */
log_error(tid, "tracee_getword() failed: %s", strerror(errno));
return 0;
/* LCOV_EXCL_END */
}
return res;
}
void *tracee_getptr(int mode, pid_t tid, const void *addr)
{
if(mode == MODE_I386)
{
/* Pointers are 32 bits */
uint32_t ptr;
tracee_read(tid, (void*)&ptr, addr, sizeof(ptr));
return (void*)(uint64_t)ptr;
}
else /* mode == MODE_X86_64 */
{
/* Pointers are 64 bits */
uint64_t ptr;
tracee_read(tid, (void*)&ptr, addr, sizeof(ptr));
return (void*)ptr;
}
}
uint64_t tracee_getlong(int mode, pid_t tid, const void *addr)
{
if(mode == MODE_I386)
{
/* Longs are 32 bits */
uint32_t val;
tracee_read(tid, (void*)&val, addr, sizeof(val));
return (uint64_t)val;
}
else /* mode == MODE_X86_64 */
{
/* Longs are 64 bits */
uint64_t val;
tracee_read(tid, (void*)&val, addr, sizeof(val));
return val;
}
}
size_t tracee_getwordsize(int mode)
{
if(mode == MODE_I386)
/* Pointers are 32 bits */
return 4;
else /* mode == MODE_X86_64 */
/* Pointers are 64 bits */
return 8;
}
size_t tracee_strlen(pid_t tid, const char *str)
{
uintptr_t ptr = (uintptr_t)str;
size_t j = ptr % WORD_SIZE;
uintptr_t i = ptr - j;
size_t size = 0;
int done = 0;
for(; !done; i += WORD_SIZE)
{
unsigned long data = tracee_getword(tid, (const void*)i);
for(; !done && j < WORD_SIZE; ++j)
{
unsigned char byte = data >> (8 * j);
if(byte == 0)
done = 1;
else
++size;
}
j = 0;
}
return size;
}
void tracee_read(pid_t tid, char *dst, const char *src, size_t size)
{
uintptr_t ptr = (uintptr_t)src;
size_t j = ptr % WORD_SIZE;
uintptr_t i = ptr - j;
uintptr_t end = ptr + size;
for(; i < end; i += WORD_SIZE)
{
unsigned long data = tracee_getword(tid, (const void*)i);
for(; j < WORD_SIZE && i + j < end; ++j)
*dst++ = data >> (8 * j);
j = 0;
}
}
char *tracee_strdup(pid_t tid, const char *str)
{
size_t length = tracee_strlen(tid, str);
char *res = malloc(length + 1);
tracee_read(tid, res, str, length);
res[length] = '\0';
return res;
}
char **tracee_strarraydup(int mode, pid_t tid, const char *const *argv)
{
/* FIXME : This is probably broken on x32 */
char **array;
/* Reads number of pointers in pointer array */
size_t nb_args = 0;
{
const char *const *a = argv;
/* xargv = *a */
const char *xargv = tracee_getptr(mode, tid, a);
while(xargv != NULL)
{
++nb_args;
++a;
xargv = tracee_getptr(mode, tid, a);
}
}
/* Allocs pointer array */
array = malloc((nb_args + 1) * sizeof(char*));
/* Dups array elements */
{
size_t i = 0;
/* xargv = argv[0] */
const char *xargv = tracee_getptr(mode, tid, argv);
while(xargv != NULL)
{
array[i] = tracee_strdup(tid, xargv);
++i;
/* xargv = argv[i] */
xargv = tracee_getptr(mode, tid, argv + i);
}
array[i] = NULL;
}
return array;
}
void free_strarray(char **array)
{
char **ptr = array;
while(*ptr)
{
free(*ptr);
++ptr;
}
free(array);
}
reprozip-1.0.10/native/ptrace_utils.h 0000644 0000000 0000000 00000000760 13017600314 017534 0 ustar root root 0000000 0000000 #ifndef PTRACE_UTILS_H
#define PTRACE_UTILS_H
void *tracee_getptr(int mode, pid_t tid, const void *addr);
uint64_t tracee_getlong(int mode, pid_t tid, const void *addr);
size_t tracee_getwordsize(int mode);
size_t tracee_strlen(pid_t tid, const char *str);
void tracee_read(pid_t tid, char *dst, const char *src, size_t size);
char *tracee_strdup(pid_t tid, const char *str);
char **tracee_strarraydup(int mode, pid_t tid, const char *const *argv);
void free_strarray(char **array);
#endif
reprozip-1.0.10/native/pytracer.c 0000644 0000000 0000000 00000010667 13073250224 016674 0 ustar root root 0000000 0000000 #include
#include "database.h"
#include "tracer.h"
PyObject *Err_Base;
/**
* Makes a C string from a Python unicode or bytes object.
*
* If successful, the result is a string that the caller must free().
* Else, returns NULL.
*/
static char *get_string(PyObject *obj)
{
if(PyUnicode_Check(obj))
{
const char *str;
PyObject *pyutf8 = PyUnicode_AsUTF8String(obj);
if(pyutf8 == NULL)
return NULL;
#if PY_MAJOR_VERSION >= 3
str = PyBytes_AsString(pyutf8);
#else
str = PyString_AsString(pyutf8);
#endif
if(str == NULL)
return NULL;
{
char *ret = strdup(str);
Py_DECREF(pyutf8);
return ret;
}
}
else if(
#if PY_MAJOR_VERSION >= 3
PyBytes_Check(obj)
#else
PyString_Check(obj)
#endif
)
{
const char *str;
#if PY_MAJOR_VERSION >= 3
str = PyBytes_AsString(obj);
#else
str = PyString_AsString(obj);
#endif
if(str == NULL)
return NULL;
return strdup(str);
}
else
return NULL;
}
static PyObject *pytracer_execute(PyObject *self, PyObject *args)
{
PyObject *ret = NULL;
int exit_status;
/* Reads arguments */
char *binary = NULL, *databasepath = NULL;
char **argv = NULL;
size_t argv_len;
int verbosity;
PyObject *py_binary, *py_argv, *py_databasepath;
if(!PyArg_ParseTuple(args, "OO!Oi",
&py_binary,
&PyList_Type, &py_argv,
&py_databasepath,
&verbosity))
return NULL;
if(verbosity < 0)
{
PyErr_SetString(Err_Base, "verbosity should be >= 0");
return NULL;
}
trace_verbosity = verbosity;
binary = get_string(py_binary);
if(binary == NULL)
goto done;
databasepath = get_string(py_databasepath);
if(databasepath == NULL)
goto done;
/* Converts argv from Python list to char[][] */
{
size_t i;
int bad = 0;
argv_len = PyList_Size(py_argv);
argv = malloc((argv_len + 1) * sizeof(char*));
for(i = 0; i < argv_len; ++i)
{
PyObject *arg = PyList_GetItem(py_argv, i);
char *str = get_string(arg);
if(str == NULL)
{
bad = 1;
break;
}
argv[i] = str;
}
if(bad)
{
size_t j;
for(j = 0; j < i; ++j)
free(argv[j]);
free(argv);
argv = NULL;
goto done;
}
argv[argv_len] = NULL;
}
if(fork_and_trace(binary, argv_len, argv, databasepath, &exit_status) == 0)
{
ret = PyLong_FromLong(exit_status);
}
else
{
PyErr_SetString(Err_Base, "Error occurred");
ret = NULL;
}
done:
free(binary);
free(databasepath);
/* Deallocs argv */
if(argv)
{
size_t i;
for(i = 0; i < argv_len; ++i)
free(argv[i]);
free(argv);
}
return ret;
}
static PyMethodDef methods[] = {
{"execute", pytracer_execute, METH_VARARGS,
"execute(binary, argv, databasepath, verbosity)\n"
"\n"
"Runs the specified binary with the argument list argv under trace and "
"writes\nthe captured events to SQLite3 database databasepath."},
{ NULL, NULL, 0, NULL }
};
#if PY_MAJOR_VERSION >= 3
static struct PyModuleDef moduledef = {
PyModuleDef_HEAD_INIT,
"reprozip._pytracer", /* m_name */
"C interface to tracer", /* m_doc */
-1, /* m_size */
methods, /* m_methods */
NULL, /* m_reload */
NULL, /* m_traverse */
NULL, /* m_clear */
NULL, /* m_free */
};
#endif
#if PY_MAJOR_VERSION >= 3
PyMODINIT_FUNC PyInit__pytracer(void)
#else
PyMODINIT_FUNC init_pytracer(void)
#endif
{
PyObject *mod;
#if PY_MAJOR_VERSION >= 3
mod = PyModule_Create(&moduledef);
#else
mod = Py_InitModule("reprozip._pytracer", methods);
#endif
if(mod == NULL)
{
#if PY_MAJOR_VERSION >= 3
return NULL;
#else
return;
#endif
}
Err_Base = PyErr_NewException("_pytracer.Error", NULL, NULL);
Py_INCREF(Err_Base);
PyModule_AddObject(mod, "Error", Err_Base);
#if PY_MAJOR_VERSION >= 3
return mod;
#endif
}
reprozip-1.0.10/native/syscalls.c 0000644 0000000 0000000 00000117217 13127722141 016701 0 ustar root root 0000000 0000000 #include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include "config.h"
#include "database.h"
#include "log.h"
#include "ptrace_utils.h"
#include "syscalls.h"
#include "tracer.h"
#include "utils.h"
#ifndef __X32_SYSCALL_BIT
#define __X32_SYSCALL_BIT 0x40000000
#endif
#ifndef SYS_CONNECT
#define SYS_CONNECT 3
#endif
#ifndef SYS_ACCEPT
#define SYS_ACCEPT 5
#endif
#define SYSCALL_I386 0
#define SYSCALL_X86_64 1
#define SYSCALL_X86_64_x32 2
#define verbosity trace_verbosity
struct syscall_table_entry {
const char *name;
int (*proc_entry)(const char*, struct Process *, unsigned int);
int (*proc_exit)(const char*, struct Process *, unsigned int);
unsigned int udata;
};
struct syscall_table {
size_t length;
struct syscall_table_entry *entries;
};
struct syscall_table *syscall_tables = NULL;
static char *abs_path_arg(const struct Process *process, size_t arg)
{
char *pathname = tracee_strdup(process->tid, process->params[arg].p);
if(pathname[0] != '/')
{
char *oldpath = pathname;
pathname = abspath(process->threadgroup->wd, oldpath);
free(oldpath);
}
return pathname;
}
static const char *print_sockaddr(void *address, socklen_t addrlen)
{
static char buffer[512];
const short family = ((struct sockaddr*)address)->sa_family;
if(family == AF_INET && addrlen >= sizeof(struct sockaddr_in))
{
struct sockaddr_in *address_ = address;
snprintf(buffer, 512, "%s:%d",
inet_ntoa(address_->sin_addr),
ntohs(address_->sin_port));
}
else if(family == AF_INET6
&& addrlen >= sizeof(struct sockaddr_in6))
{
struct sockaddr_in6 *address_ = address;
char buf[50];
inet_ntop(AF_INET6, &address_->sin6_addr, buf, sizeof(buf));
snprintf(buffer, 512, "[%s]:%d", buf, ntohs(address_->sin6_port));
}
else
snprintf(buffer, 512, "", family);
return buffer;
}
/* ********************
* Other syscalls that might be of interest but that we don't handle yet
*/
static int syscall_unhandled_path1(const char *name, struct Process *process,
unsigned int udata)
{
if(verbosity >= 1 && process->in_syscall && process->retvalue.i >= 0
&& name != NULL)
{
char *pathname = abs_path_arg(process, 0);
log_info(process->tid, "process used unhandled system call %s(\"%s\")",
name, pathname);
free(pathname);
}
return 0;
}
static int syscall_unhandled_other(const char *name, struct Process *process,
unsigned int udata)
{
if(verbosity >= 1 && process->in_syscall && process->retvalue.i >= 0
&& name != NULL)
log_info(process->tid, "process used unhandled system call %s", name);
return 0;
}
/* ********************
* open(), creat(), access()
*/
#define SYSCALL_OPENING_OPEN 1
#define SYSCALL_OPENING_ACCESS 2
#define SYSCALL_OPENING_CREAT 3
static int syscall_fileopening_in(const char *name, struct Process *process,
unsigned int udata)
{
unsigned int mode = flags2mode(process->params[1].u);
if( (mode & FILE_READ) && (mode & FILE_WRITE) )
{
char *pathname = abs_path_arg(process, 0);
if(access(pathname, F_OK) != 0 && errno == ENOENT)
{
log_debug(process->tid, "Doing RW open, file exists: no");
process->flags &= ~PROCFLAG_OPEN_EXIST;
}
else
{
log_debug(process->tid, "Doing RW open, file exists: yes");
process->flags |= PROCFLAG_OPEN_EXIST;
}
free(pathname);
}
return 0;
}
static int syscall_fileopening_out(const char *name, struct Process *process,
unsigned int syscall)
{
unsigned int mode;
char *pathname = abs_path_arg(process, 0);
if(syscall == SYSCALL_OPENING_ACCESS)
mode = FILE_STAT;
else if(syscall == SYSCALL_OPENING_CREAT)
mode = flags2mode(process->params[1].u |
O_CREAT | O_WRONLY | O_TRUNC);
else /* syscall == SYSCALL_OPENING_OPEN */
{
mode = flags2mode(process->params[1].u);
if( (process->retvalue.i >= 0) /* Open succeeded */
&& (mode & FILE_READ) && (mode & FILE_WRITE) ) /* In readwrite mode */
{
/* But the file doesn't exist */
if(!(process->flags & PROCFLAG_OPEN_EXIST))
/* Consider this a simple write */
mode &= ~FILE_READ;
}
}
if(verbosity >= 3)
{
/* Converts mode to string s_mode */
char mode_buf[42] = "";
const char *s_mode;
if(mode & FILE_READ)
strcat(mode_buf, "|FILE_READ");
if(mode & FILE_WRITE)
strcat(mode_buf, "|FILE_WRITE");
if(mode & FILE_WDIR)
strcat(mode_buf, "|FILE_WDIR");
if(mode & FILE_STAT)
strcat(mode_buf, "|FILE_STAT");
s_mode = mode_buf[0]?mode_buf + 1:"0";
if(syscall == SYSCALL_OPENING_OPEN)
log_debug(process->tid,
"open(\"%s\", mode=%s) = %d (%s)",
pathname,
s_mode,
(int)process->retvalue.i,
(process->retvalue.i >= 0)?"success":"failure");
else /* creat or access */
log_debug(process->tid,
"%s(\"%s\") (mode=%s) = %d (%s)",
(syscall == SYSCALL_OPENING_OPEN)?"open":
(syscall == SYSCALL_OPENING_CREAT)?"creat":"access",
pathname,
s_mode,
(int)process->retvalue.i,
(process->retvalue.i >= 0)?"success":"failure");
}
if(process->retvalue.i >= 0)
{
if(db_add_file_open(process->identifier,
pathname,
mode,
path_is_dir(pathname)) != 0)
return -1;
}
free(pathname);
return 0;
}
/* ********************
* rename(), link(), symlink()
*/
static int syscall_filecreating(const char *name, struct Process *process,
unsigned int is_symlink)
{
if(process->retvalue.i >= 0)
{
char *written_path = abs_path_arg(process, 1);
int is_dir = path_is_dir(written_path);
/* symlink doesn't actually read the source */
if(!is_symlink)
{
char *read_path = abs_path_arg(process, 0);
if(db_add_file_open(process->identifier,
read_path,
FILE_READ | FILE_LINK,
is_dir) != 0)
return -1;
free(read_path);
}
if(db_add_file_open(process->identifier,
written_path,
FILE_WRITE | FILE_LINK,
is_dir) != 0)
return -1;
free(written_path);
}
return 0;
}
static int syscall_filecreating_at(const char *name, struct Process *process,
unsigned int is_symlink)
{
if(process->retvalue.i >= 0)
{
if( (process->params[0].i == AT_FDCWD)
&& (process->params[2].i == AT_FDCWD) )
{
char *written_path = abs_path_arg(process, 3);
int is_dir = path_is_dir(written_path);
/* symlink doesn't actually read the source */
if(!is_symlink)
{
char *read_path = abs_path_arg(process, 1);
if(db_add_file_open(process->identifier,
read_path,
FILE_READ | FILE_LINK,
is_dir) != 0)
return -1;
free(read_path);
}
if(db_add_file_open(process->identifier,
written_path,
FILE_WRITE | FILE_LINK,
is_dir) != 0)
return -1;
free(written_path);
}
else
return syscall_unhandled_other(name, process, 0);
}
return 0;
}
/* ********************
* stat(), lstat()
*/
static int syscall_filestat(const char *name, struct Process *process,
unsigned int no_deref)
{
if(process->retvalue.i >= 0)
{
char *pathname = abs_path_arg(process, 0);
if(db_add_file_open(process->identifier,
pathname,
FILE_STAT | (no_deref?FILE_LINK:0),
path_is_dir(pathname)) != 0)
return -1;
free(pathname);
}
return 0;
}
/* ********************
* readlink()
*/
static int syscall_readlink(const char *name, struct Process *process,
unsigned int udata)
{
if(process->retvalue.i >= 0)
{
char *pathname = abs_path_arg(process, 0);
if(db_add_file_open(process->identifier,
pathname,
FILE_STAT | FILE_LINK,
0) != 0)
return -1;
free(pathname);
}
return 0;
}
/* ********************
* mkdir()
*/
static int syscall_mkdir(const char *name, struct Process *process,
unsigned int udata)
{
if(process->retvalue.i >= 0)
{
char *pathname = abs_path_arg(process, 0);
log_debug(process->tid, "mkdir(\"%s\")", pathname);
if(db_add_file_open(process->identifier,
pathname,
FILE_WRITE,
1) != 0)
return -1;
free(pathname);
}
return 0;
}
/* ********************
* chdir()
*/
static int syscall_chdir(const char *name, struct Process *process,
unsigned int udata)
{
if(process->retvalue.i >= 0)
{
char *pathname = abs_path_arg(process, 0);
free(process->threadgroup->wd);
process->threadgroup->wd = pathname;
if(db_add_file_open(process->identifier,
pathname,
FILE_WDIR,
1) != 0)
return -1;
}
return 0;
}
/* ********************
* execve()
*
* See also special handling in syscall_handle() and PTRACE_EVENT_EXEC case
* in trace().
*/
#define SHEBANG_MAX_LEN 128 /* = Linux's BINPRM_BUF_SIZE */
static int record_shebangs(struct Process *process, const char *exec_target)
{
const char *wd = process->threadgroup->wd;
char buffer[SHEBANG_MAX_LEN];
char target_buffer[SHEBANG_MAX_LEN];
int step;
for(step = 0; step < 4; ++step)
{
FILE *execd = fopen(exec_target, "rb");
size_t ret = 0;
if(execd != NULL)
{
ret = fread(buffer, 1, SHEBANG_MAX_LEN - 1, execd);
fclose(execd);
}
if(ret == 0)
{
log_error(process->tid, "couldn't open executed file %s", exec_target);
return 0;
}
if(buffer[0] != '#' || buffer[1] != '!')
return 0;
else
{
char *start = buffer + 2;
buffer[ret] = '\0';
while(*start == '\t' || *start == ' ')
++start;
if(*start == '\n' || *start == '\0')
{
log_info(process->tid, "empty shebang in %s", exec_target);
return 0;
}
{
char *end = start;
while(*end != '\t' && *end != ' ' &&
*end != '\n' && *end != '\0')
++end;
*end = '\0';
}
log_info(process->tid, "read shebang: %s -> %s", exec_target, start);
if(*start != '/')
{
char *pathname = abspath(wd, start);
if(db_add_file_open(process->identifier,
pathname,
FILE_READ,
0) != 0)
return -1;
free(pathname);
}
else
if(db_add_file_open(process->identifier,
start,
FILE_READ,
0) != 0)
return -1;
exec_target = strcpy(target_buffer, start);
}
}
log_error(process->tid, "reached maximum shebang depth");
return 0;
}
static int syscall_execve_in(const char *name, struct Process *process,
unsigned int udata)
{
/* int execve(const char *filename,
* char *const argv[],
* char *const envp[]); */
struct ExecveInfo *execi = malloc(sizeof(struct ExecveInfo));
execi->binary = abs_path_arg(process, 0);
execi->argv = tracee_strarraydup(process->mode, process->tid,
process->params[1].p);
execi->envp = tracee_strarraydup(process->mode, process->tid,
process->params[2].p);
if(verbosity >= 3)
{
log_debug(process->tid, "execve called:\n binary=%s\n argv:",
execi->binary);
{
/* Note: this conversion is correct and shouldn't need a
* cast */
const char *const *v = (const char* const*)execi->argv;
while(*v)
{
log_debug(process->tid, " %s", *v);
++v;
}
}
{
size_t nb = 0;
while(execi->envp[nb] != NULL)
++nb;
log_debug(process->tid, " envp: (%u entries)", (unsigned int)nb);
}
}
process->execve_info = execi;
return 0;
}
int syscall_execve_event(struct Process *process)
{
struct Process *exec_process = process;
struct ExecveInfo *execi = exec_process->execve_info;
if(execi == NULL)
{
/* On Linux, execve changes tid to the thread leader's tid, no
* matter which thread made the call. This means that the process
* that just returned from execve might not be the one which
* called.
* So we start by finding the one which called execve.
* No possible confusion here since all other threads will have been
* terminated by the kernel. */
size_t i;
for(i = 0; i < processes_size; ++i)
{
if(processes[i]->status == PROCSTAT_ATTACHED
&& processes[i]->threadgroup == process->threadgroup
&& processes[i]->in_syscall
&& processes[i]->execve_info != NULL)
{
exec_process = processes[i];
break;
}
}
if(exec_process == NULL)
{
/* LCOV_EXCL_START : internal error */
log_critical(process->tid,
"execve() completed but call wasn't recorded");
return -1;
/* LCOV_EXCL_END */
}
execi = exec_process->execve_info;
/* The process that called execve() disappears without any trace */
if(db_add_exit(exec_process->identifier, 0) != 0)
return -1;
if(verbosity >= 3)
log_debug(exec_process->tid,
"original exec'ing thread removed, tgid: %d",
process->tid);
exec_process->execve_info = NULL;
trace_free_process(exec_process);
}
else
exec_process->execve_info = NULL;
process->flags = PROCFLAG_EXECD;
/* Note: execi->argv needs a cast to suppress a bogus warning
* While conversion from char** to const char** is invalid, conversion from
* char** to const char*const* is, in fact, safe.
* G++ accepts it, GCC issues a warning. */
if(db_add_exec(process->identifier, execi->binary,
(const char *const*)execi->argv,
(const char *const*)execi->envp,
process->threadgroup->wd) != 0)
return -1;
/* Note that here, the database records that the thread leader called
* execve, instead of thread exec_process->tid. */
if(verbosity >= 2)
log_info(process->tid, "successfully exec'd %s",
execi->binary);
/* Follow shebangs */
if(record_shebangs(process, execi->binary) != 0)
return -1;
if(trace_add_files_from_proc(process->identifier, process->tid,
execi->binary) != 0)
return -1;
free_execve_info(execi);
return 0;
}
static int syscall_execve_out(const char *name, struct Process *process,
unsigned int execve_syscall)
{
log_debug(process->tid, "execve() failed");
if(process->execve_info != NULL)
{
free_execve_info(process->execve_info);
process->execve_info = NULL;
}
return 0;
}
/* ********************
* fork(), clone(), ...
*/
static int syscall_fork_in(const char *name, struct Process *process,
unsigned int udata)
{
process->flags |= PROCFLAG_FORKING;
return 0;
}
static int syscall_fork_out(const char *name, struct Process *process,
unsigned int udata)
{
process->flags &= ~PROCFLAG_FORKING;
return 0;
}
int syscall_fork_event(struct Process *process, unsigned int event)
{
#ifndef CLONE_THREAD
#define CLONE_THREAD 0x00010000
#endif
int is_thread = 0;
struct Process *new_process;
unsigned long new_tid;
ptrace(PTRACE_GETEVENTMSG, process->tid, NULL, &new_tid);
if( (process->flags & PROCFLAG_FORKING) == 0)
{
/* LCOV_EXCL_START : internal error */
log_critical(process->tid,
"process created new process %d but we didn't see syscall "
"entry", new_tid);
return -1;
/* LCOV_EXCL_END */
}
else if(event == PTRACE_EVENT_CLONE)
is_thread = process->params[0].u & CLONE_THREAD;
process->flags &= ~PROCFLAG_FORKING;
if(verbosity >= 2)
log_info(new_tid, "process created by %d via %s\n"
" (thread: %s) (working directory: %s)",
process->tid,
(event == PTRACE_EVENT_FORK)?"fork()":
(event == PTRACE_EVENT_VFORK)?"vfork()":
"clone()",
is_thread?"yes":"no",
process->threadgroup->wd);
/* At this point, the process might have been seen by waitpid in trace() or
* not */
new_process = trace_find_process(new_tid);
if(new_process != NULL)
{
/* Process has been seen before and options were set */
if(new_process->status != PROCSTAT_UNKNOWN)
{
/* LCOV_EXCL_START: : internal error */
log_critical(new_tid,
"just created process that is already running "
"(status=%d)", new_process->status);
return -1;
/* LCOV_EXCL_END */
}
new_process->status = PROCSTAT_ATTACHED;
ptrace(PTRACE_SYSCALL, new_process->tid, NULL, NULL);
if(verbosity >= 2)
{
unsigned int nproc, unknown;
trace_count_processes(&nproc, &unknown);
log_info(0, "%d processes (inc. %d unattached)",
nproc, unknown);
}
}
else
{
/* Process hasn't been seen before (event happened first) */
new_process = trace_get_empty_process();
new_process->status = PROCSTAT_ALLOCATED;
new_process->flags = 0;
/* New process gets a SIGSTOP, but we resume on attach */
new_process->tid = new_tid;
new_process->in_syscall = 0;
}
if(is_thread)
{
new_process->threadgroup = process->threadgroup;
process->threadgroup->refs++;
if(verbosity >= 3)
log_debug(process->threadgroup->tgid, "threadgroup refs=%d",
process->threadgroup->refs);
}
else
new_process->threadgroup = trace_new_threadgroup(
new_process->tid,
strdup(process->threadgroup->wd));
/* Parent will also get a SIGTRAP with PTRACE_EVENT_FORK */
if(db_add_process(&new_process->identifier,
process->identifier,
process->threadgroup->wd, is_thread) != 0)
return -1;
return 0;
}
/* ********************
* Network connections
*/
static int handle_accept(struct Process *process,
void *arg1, void *arg2)
{
socklen_t addrlen;
tracee_read(process->tid, (void*)&addrlen, arg2, sizeof(addrlen));
if(addrlen >= sizeof(short))
{
void *address = malloc(addrlen);
tracee_read(process->tid, address, arg1, addrlen);
log_info(process->tid, "process accepted a connection from %s",
print_sockaddr(address, addrlen));
free(address);
}
return 0;
}
static int handle_connect(struct Process *process,
void *arg1, socklen_t addrlen)
{
if(addrlen >= sizeof(short))
{
void *address = malloc(addrlen);
tracee_read(process->tid, address, arg1, addrlen);
log_info(process->tid, "process connected to %s",
print_sockaddr(address, addrlen));
free(address);
}
return 0;
}
static int syscall_socketcall(const char *name, struct Process *process,
unsigned int udata)
{
if(process->retvalue.i >= 0)
{
/* Argument 1 is an array of longs, which are either numbers of pointers */
uint64_t args = process->params[1].u;
/* Size of each element in the array */
const size_t wordsize = tracee_getwordsize(process->mode);
/* Note that void* pointer arithmetic is illegal, hence the uint */
if(process->params[0].u == SYS_ACCEPT)
return handle_accept(process,
tracee_getptr(process->mode, process->tid,
(void*)(args + 1*wordsize)),
tracee_getptr(process->mode, process->tid,
(void*)(args + 2*wordsize)));
else if(process->params[0].u == SYS_CONNECT)
return handle_connect(process,
tracee_getptr(process->mode, process->tid,
(void*)(args + 1*wordsize)),
tracee_getlong(process->mode, process->tid,
(void*)(args + 2*wordsize)));
}
return 0;
}
static int syscall_accept(const char *name, struct Process *process,
unsigned int udata)
{
if(process->retvalue.i >= 0)
return handle_accept(process,
process->params[1].p, process->params[2].p);
else
return 0;
}
static int syscall_connect(const char *name, struct Process *process,
unsigned int udata)
{
if(process->retvalue.i >= 0)
return handle_connect(process,
process->params[1].p, process->params[2].u);
else
return 0;
}
/* ********************
* *at variants, handled if dirfd is AT_FDCWD
*/
static int syscall_xxx_at(const char *name, struct Process *process,
unsigned int real_syscall)
{
/* Argument 0 is a file descriptor, we assume that the rest of them match
* the non-at variant of the syscall */
if(process->params[0].i == AT_FDCWD)
{
struct syscall_table_entry *entry = NULL;
struct syscall_table *tbl;
size_t syscall_type;
if(process->mode == MODE_I386)
syscall_type = SYSCALL_I386;
else if(process->current_syscall & __X32_SYSCALL_BIT)
syscall_type = SYSCALL_X86_64_x32;
else
syscall_type = SYSCALL_X86_64;
tbl = &syscall_tables[syscall_type];
if(real_syscall < tbl->length)
entry = &tbl->entries[real_syscall];
if(entry == NULL || entry->name == NULL || entry->proc_exit == NULL)
{
log_critical(process->tid, "INVALID SYSCALL in *at dispatch: %d",
real_syscall);
return 0;
}
else
{
int ret;
/* Shifts arguments */
size_t i;
register_type arg0 = process->params[0];
for(i = 0; i < PROCESS_ARGS - 1; ++i)
process->params[i] = process->params[i + 1];
ret = entry->proc_exit(name, process, entry->udata);
for(i = PROCESS_ARGS; i > 1; --i)
process->params[i - 1] = process->params[i - 2];
process->params[0] = arg0;
return ret;
}
}
else
{
char *pathname = tracee_strdup(process->tid, process->params[1].p);
log_info(process->tid,
"process used unhandled system call %s(%d, \"%s\")",
name, process->params[0].i, pathname);
free(pathname);
return 0;
}
}
/* ********************
* Building the syscall table
*/
struct unprocessed_table_entry {
unsigned int n;
const char *name;
int (*proc_entry)(const char*, struct Process *, unsigned int);
int (*proc_exit)(const char*, struct Process *, unsigned int);
unsigned int udata;
};
struct syscall_table *process_table(struct syscall_table *table,
const struct unprocessed_table_entry *orig)
{
size_t i, length = 0;
const struct unprocessed_table_entry *pos;
/* Measure required table */
pos = orig;
while(pos->proc_entry || pos->proc_exit)
{
if(pos->n + 1 > length)
length = pos->n + 1;
++pos;
}
/* Allocate table */
table->length = length;
table->entries = malloc(sizeof(struct syscall_table_entry) * length);
/* Initialize to NULL */
for(i = 0; i < length; ++i)
{
table->entries[i].name = NULL;
table->entries[i].proc_entry = NULL;
table->entries[i].proc_exit = NULL;
}
/* Copy from unordered list */
{
pos = orig;
while(pos->proc_entry || pos->proc_exit)
{
table->entries[pos->n].name = pos->name;
table->entries[pos->n].proc_entry = pos->proc_entry;
table->entries[pos->n].proc_exit = pos->proc_exit;
table->entries[pos->n].udata = pos->udata;
++pos;
}
}
return table;
}
void syscall_build_table(void)
{
if(syscall_tables != NULL)
return ;
#if defined(I386)
syscall_tables = malloc(1 * sizeof(struct syscall_table));
#elif defined(X86_64)
syscall_tables = malloc(3 * sizeof(struct syscall_table));
#else
# error Unrecognized architecture!
#endif
/* i386 */
{
struct unprocessed_table_entry list[] = {
{ 5, "open", syscall_fileopening_in, syscall_fileopening_out,
SYSCALL_OPENING_OPEN},
{ 8, "creat", NULL, syscall_fileopening_out, SYSCALL_OPENING_CREAT},
{ 33, "access", NULL, syscall_fileopening_out, SYSCALL_OPENING_ACCESS},
{106, "stat", NULL, syscall_filestat, 0},
{107, "lstat", NULL, syscall_filestat, 1},
{195, "stat64", NULL, syscall_filestat, 0},
{ 18, "oldstat", NULL, syscall_filestat, 0},
{196, "lstat64", NULL, syscall_filestat, 1},
{ 84, "oldlstat", NULL, syscall_filestat, 1},
{ 85, "readlink", NULL, syscall_readlink, 0},
{ 39, "mkdir", NULL, syscall_mkdir, 0},
{ 12, "chdir", NULL, syscall_chdir, 0},
{ 11, "execve", syscall_execve_in, syscall_execve_out, 11},
{ 2, "fork", syscall_fork_in, syscall_fork_out, 0},
{190, "vfork", syscall_fork_in, syscall_fork_out, 0},
{120, "clone", syscall_fork_in, syscall_fork_out, 0},
{102, "socketcall", NULL, syscall_socketcall, 0},
/* File-creating syscalls: created path is second argument */
{ 38, "rename", NULL, syscall_filecreating, 0},
{ 9, "link", NULL, syscall_filecreating, 0},
{ 83, "symlink", NULL, syscall_filecreating, 1},
/* File-creating syscalls, at variants: unhandled if first or third
* argument is not AT_FDCWD, second is read, fourth is created */
{302, "renameat", NULL, syscall_filecreating_at, 0},
{303, "linkat", NULL, syscall_filecreating_at, 0},
{304, "symlinkat", NULL, syscall_filecreating_at, 1},
/* Half-implemented: *at() variants, when dirfd is AT_FDCWD */
{296, "mkdirat", NULL, syscall_xxx_at, 39},
{295, "openat", NULL, syscall_xxx_at, 5},
{307, "faccessat", NULL, syscall_xxx_at, 33},
{305, "readlinkat", NULL, syscall_xxx_at, 85},
{300, "fstatat64", NULL, syscall_xxx_at, 195},
/* Unhandled with path as first argument */
{ 40, "rmdir", NULL, syscall_unhandled_path1, 0},
{ 92, "truncate", NULL, syscall_unhandled_path1, 0},
{193, "truncate64", NULL, syscall_unhandled_path1, 0},
{ 10, "unlink", NULL, syscall_unhandled_path1, 0},
{ 15, "chmod", NULL, syscall_unhandled_path1, 0},
{182, "chown", NULL, syscall_unhandled_path1, 0},
{212, "chown32", NULL, syscall_unhandled_path1, 0},
{ 16, "lchown", NULL, syscall_unhandled_path1, 0},
{198, "lchown32", NULL, syscall_unhandled_path1, 0},
{ 30, "utime", NULL, syscall_unhandled_path1, 0},
{271, "utimes", NULL, syscall_unhandled_path1, 0},
{277, "mq_open", NULL, syscall_unhandled_path1, 0},
{278, "mq_unlink", NULL, syscall_unhandled_path1, 0},
/* Unhandled which use open descriptors */
{301, "unlinkat", NULL, syscall_unhandled_other, 0},
{306, "fchmodat", NULL, syscall_unhandled_other, 0},
{298, "fchownat", NULL, syscall_unhandled_other, 0},
/* Other unhandled */
{ 26, "ptrace", NULL, syscall_unhandled_other, 0},
{341, "name_to_handle_at", NULL, syscall_unhandled_other, 0},
/* Sentinel */
{0, NULL, NULL, NULL, 0}
};
process_table(&syscall_tables[SYSCALL_I386], list);
}
#ifdef X86_64
/* x64 */
{
struct unprocessed_table_entry list[] = {
{ 2, "open", syscall_fileopening_in, syscall_fileopening_out,
SYSCALL_OPENING_OPEN},
{ 85, "creat", NULL, syscall_fileopening_out, SYSCALL_OPENING_CREAT},
{ 21, "access", NULL, syscall_fileopening_out, SYSCALL_OPENING_ACCESS},
{ 4, "stat", NULL, syscall_filestat, 0},
{ 6, "lstat", NULL, syscall_filestat, 1},
{ 89, "readlink", NULL, syscall_readlink, 0},
{ 83, "mkdir", NULL, syscall_mkdir, 0},
{ 80, "chdir", NULL, syscall_chdir, 0},
{ 59, "execve", syscall_execve_in, syscall_execve_out, 59},
{ 57, "fork", syscall_fork_in, syscall_fork_out, 0},
{ 58, "vfork", syscall_fork_in, syscall_fork_out, 0},
{ 56, "clone", syscall_fork_in, syscall_fork_out, 0},
{ 43, "accept", NULL, syscall_accept, 0},
{288, "accept4", NULL, syscall_accept, 0},
{ 42, "connect", NULL, syscall_connect, 0},
/* File-creating syscalls: created path is second argument */
{ 82, "rename", NULL, syscall_filecreating, 0},
{ 86, "link", NULL, syscall_filecreating, 0},
{ 88, "symlink", NULL, syscall_filecreating, 1},
/* File-creating syscalls, at variants: unhandled if first or third
* argument is not AT_FDCWD, second is read, fourth is created */
{264, "renameat", NULL, syscall_filecreating_at, 0},
{265, "linkat", NULL, syscall_filecreating_at, 0},
{266, "symlinkat", NULL, syscall_filecreating_at, 1},
/* Half-implemented: *at() variants, when dirfd is AT_FDCWD */
{258, "mkdirat", NULL, syscall_xxx_at, 83},
{257, "openat", NULL, syscall_xxx_at, 2},
{269, "faccessat", NULL, syscall_xxx_at, 21},
{267, "readlinkat", NULL, syscall_xxx_at, 89},
{262, "newfstatat", NULL, syscall_xxx_at, 4},
/* Unhandled with path as first argument */
{ 84, "rmdir", NULL, syscall_unhandled_path1, 0},
{ 76, "truncate", NULL, syscall_unhandled_path1, 0},
{ 87, "unlink", NULL, syscall_unhandled_path1, 0},
{ 90, "chmod", NULL, syscall_unhandled_path1, 0},
{ 92, "chown", NULL, syscall_unhandled_path1, 0},
{ 94, "lchown", NULL, syscall_unhandled_path1, 0},
{132, "utime", NULL, syscall_unhandled_path1, 0},
{235, "utimes", NULL, syscall_unhandled_path1, 0},
{240, "mq_open", NULL, syscall_unhandled_path1, 0},
{241, "mq_unlink", NULL, syscall_unhandled_path1, 0},
/* Unhandled which use open descriptors */
{263, "unlinkat", NULL, syscall_unhandled_other, 0},
{268, "fchmodat", NULL, syscall_unhandled_other, 0},
{260, "fchownat", NULL, syscall_unhandled_other, 0},
/* Other unhandled */
{101, "ptrace", NULL, syscall_unhandled_other, 0},
{303, "name_to_handle_at", NULL, syscall_unhandled_other, 0},
/* Sentinel */
{0, NULL, NULL, NULL, 0}
};
process_table(&syscall_tables[SYSCALL_X86_64], list);
}
/* x32 */
{
struct unprocessed_table_entry list[] = {
{ 2, "open", syscall_fileopening_in, syscall_fileopening_out,
SYSCALL_OPENING_OPEN},
{ 85, "creat", NULL, syscall_fileopening_out, SYSCALL_OPENING_CREAT},
{ 21, "access", NULL, syscall_fileopening_out, SYSCALL_OPENING_ACCESS},
{ 4, "stat", NULL, syscall_filestat, 0},
{ 6, "lstat", NULL, syscall_filestat, 1},
{ 89, "readlink", NULL, syscall_readlink, 0},
{ 83, "mkdir", NULL, syscall_mkdir, 0},
{ 80, "chdir", NULL, syscall_chdir, 0},
{520, "execve", syscall_execve_in, syscall_execve_out,
__X32_SYSCALL_BIT + 520},
{ 57, "fork", syscall_fork_in, syscall_fork_out, 0},
{ 58, "vfork", syscall_fork_in, syscall_fork_out, 0},
{ 56, "clone", syscall_fork_in, syscall_fork_out, 0},
{ 43, "accept", NULL, syscall_accept, 0},
{288, "accept4", NULL, syscall_accept, 0},
{ 42, "connect", NULL, syscall_connect, 0},
/* File-creating syscalls: created path is second argument */
{ 82, "rename", NULL, syscall_filecreating, 0},
{ 86, "link", NULL, syscall_filecreating, 0},
{ 88, "symlink", NULL, syscall_filecreating, 1},
/* File-creating syscalls, at variants: unhandled if first or third
* argument is not AT_FDCWD, second is read, fourth is created */
{264, "renameat", NULL, syscall_filecreating_at, 0},
{265, "linkat", NULL, syscall_filecreating_at, 0},
{266, "symlinkat", NULL, syscall_filecreating_at, 1},
/* Half-implemented: *at() variants, when dirfd is AT_FDCWD */
{258, "mkdirat", NULL, syscall_xxx_at, 83},
{257, "openat", NULL, syscall_xxx_at, 2},
{269, "faccessat", NULL, syscall_xxx_at, 21},
{267, "readlinkat", NULL, syscall_xxx_at, 89},
{262, "newfstatat", NULL, syscall_xxx_at, 4},
/* Unhandled with path as first argument */
{ 84, "rmdir", NULL, syscall_unhandled_path1, 0},
{ 76, "truncate", NULL, syscall_unhandled_path1, 0},
{ 87, "unlink", NULL, syscall_unhandled_path1, 0},
{ 90, "chmod", NULL, syscall_unhandled_path1, 0},
{ 92, "chown", NULL, syscall_unhandled_path1, 0},
{ 94, "lchown", NULL, syscall_unhandled_path1, 0},
{132, "utime", NULL, syscall_unhandled_path1, 0},
{235, "utimes", NULL, syscall_unhandled_path1, 0},
{240, "mq_open", NULL, syscall_unhandled_path1, 0},
{241, "mq_unlink", NULL, syscall_unhandled_path1, 0},
/* Unhandled which use open descriptors */
{263, "unlinkat", NULL, syscall_unhandled_other, 0},
{268, "fchmodat", NULL, syscall_unhandled_other, 0},
{260, "fchownat", NULL, syscall_unhandled_other, 0},
/* Other unhandled */
{521, "ptrace", NULL, syscall_unhandled_other, 0},
{303, "name_to_handle_at", NULL, syscall_unhandled_other, 0},
/* Sentinel */
{0, NULL, NULL, NULL, 0}
};
process_table(&syscall_tables[SYSCALL_X86_64_x32], list);
}
#endif
}
/* ********************
* Handle a syscall via the table
*/
int syscall_handle(struct Process *process)
{
pid_t tid = process->tid;
const int syscall = process->current_syscall & ~__X32_SYSCALL_BIT;
size_t syscall_type;
const char *inout = process->in_syscall?"out":"in";
if(process->mode == MODE_I386)
{
syscall_type = SYSCALL_I386;
if(verbosity >= 4)
log_debug(process->tid, "syscall %d (i386) (%s)", syscall, inout);
}
else if(process->current_syscall & __X32_SYSCALL_BIT)
{
/* LCOV_EXCL_START : x32 is not supported right now */
syscall_type = SYSCALL_X86_64_x32;
if(verbosity >= 4)
log_debug(process->tid, "syscall %d (x32) (%s)", syscall, inout);
/* LCOV_EXCL_END */
}
else
{
syscall_type = SYSCALL_X86_64;
if(verbosity >= 4)
log_debug(process->tid, "syscall %d (x64) (%s)", syscall, inout);
}
if(process->flags & PROCFLAG_EXECD)
{
if(verbosity >= 4)
log_debug(process->tid,
"ignoring, EXEC'D is set -- just post-exec syscall-"
"return stop");
process->flags &= ~PROCFLAG_EXECD;
if(process->execve_info != NULL)
{
free_execve_info(process->execve_info);
process->execve_info = NULL;
}
process->in_syscall = 1; /* set to 0 before function returns */
}
else
{
struct syscall_table_entry *entry = NULL;
struct syscall_table *tbl = &syscall_tables[syscall_type];
if(syscall < 0 || syscall >= 2000)
log_error(process->tid, "INVALID SYSCALL %d", syscall);
if(entry == NULL && syscall >= 0 && (size_t)syscall < tbl->length)
entry = &tbl->entries[syscall];
if(entry != NULL)
{
int ret = 0;
if(entry->name && verbosity >= 3)
log_debug(process->tid, "%s()", entry->name);
if(!process->in_syscall && entry->proc_entry)
ret = entry->proc_entry(entry->name, process, entry->udata);
else if(process->in_syscall && entry->proc_exit)
ret = entry->proc_exit(entry->name, process, entry->udata);
if(ret != 0)
return -1;
}
}
/* Run to next syscall */
if(process->in_syscall)
{
process->in_syscall = 0;
if(process->execve_info != NULL)
{
log_error(process->tid, "out of syscall with execve_info != NULL");
return -1;
}
process->current_syscall = -1;
}
else
process->in_syscall = 1;
ptrace(PTRACE_SYSCALL, tid, NULL, NULL);
return 0;
}
reprozip-1.0.10/native/syscalls.h 0000644 0000000 0000000 00000000411 13017600314 016664 0 ustar root root 0000000 0000000 #ifndef SYSCALL_H
#define SYSCALL_H
#include "tracer.h"
void syscall_build_table(void);
int syscall_handle(struct Process *process);
int syscall_execve_event(struct Process *process);
int syscall_fork_event(struct Process *process, unsigned int event);
#endif
reprozip-1.0.10/native/tracer.c 0000644 0000000 0000000 00000054463 13127722141 016327 0 ustar root root 0000000 0000000 #include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include "config.h"
#include "database.h"
#include "log.h"
#include "ptrace_utils.h"
#include "syscalls.h"
#include "tracer.h"
#include "utils.h"
#ifndef NT_PRSTATUS
#define NT_PRSTATUS 1
#endif
struct i386_regs {
int32_t ebx;
int32_t ecx;
int32_t edx;
int32_t esi;
int32_t edi;
int32_t ebp;
int32_t eax;
int32_t xds;
int32_t xes;
int32_t xfs;
int32_t xgs;
int32_t orig_eax;
int32_t eip;
int32_t xcs;
int32_t eflags;
int32_t esp;
int32_t xss;
};
struct x86_64_regs {
int64_t r15;
int64_t r14;
int64_t r13;
int64_t r12;
int64_t rbp;
int64_t rbx;
int64_t r11;
int64_t r10;
int64_t r9;
int64_t r8;
int64_t rax;
int64_t rcx;
int64_t rdx;
int64_t rsi;
int64_t rdi;
int64_t orig_rax;
int64_t rip;
int64_t cs;
int64_t eflags;
int64_t rsp;
int64_t ss;
int64_t fs_base;
int64_t gs_base;
int64_t ds;
int64_t es;
int64_t fs;
int64_t gs;
};
static void get_i386_reg(register_type *reg, uint32_t value)
{
reg->i = (int32_t)value;
reg->u = value;
reg->p = (void*)(uint64_t)value;
}
static void get_x86_64_reg(register_type *reg, uint64_t value)
{
reg->i = (int64_t)value;
reg->u = value;
reg->p = (void*)value;
}
int trace_verbosity = 0;
#define verbosity trace_verbosity
void free_execve_info(struct ExecveInfo *execi)
{
free_strarray(execi->argv);
free_strarray(execi->envp);
free(execi->binary);
free(execi);
}
struct Process **processes = NULL;
size_t processes_size;
struct Process *trace_find_process(pid_t tid)
{
size_t i;
for(i = 0; i < processes_size; ++i)
{
if(processes[i]->status != PROCSTAT_FREE && processes[i]->tid == tid)
return processes[i];
}
return NULL;
}
struct Process *trace_get_empty_process(void)
{
size_t i;
for(i = 0; i < processes_size; ++i)
{
if(processes[i]->status == PROCSTAT_FREE)
return processes[i];
}
/* Count unknown processes */
if(verbosity >= 3)
{
size_t unknown = 0;
for(i = 0; i < processes_size; ++i)
if(processes[i]->status == PROCSTAT_UNKNOWN)
++unknown;
log_debug(0, "there are %u/%u UNKNOWN processes",
(unsigned int)unknown, (unsigned int)processes_size);
}
/* Allocate more! */
if(verbosity >= 3)
log_debug(0, "process table full (%d), reallocating",
(int)processes_size);
{
struct Process *pool;
size_t prev_size = processes_size;
processes_size *= 2;
pool = malloc((processes_size - prev_size) * sizeof(*pool));
processes = realloc(processes, processes_size * sizeof(*processes));
for(; i < processes_size; ++i)
{
processes[i] = pool++;
processes[i]->status = PROCSTAT_FREE;
processes[i]->threadgroup = NULL;
processes[i]->execve_info = NULL;
}
return processes[prev_size];
}
}
struct ThreadGroup *trace_new_threadgroup(pid_t tgid, char *wd)
{
struct ThreadGroup *threadgroup = malloc(sizeof(struct ThreadGroup));
threadgroup->tgid = tgid;
threadgroup->wd = wd;
threadgroup->refs = 1;
if(verbosity >= 3)
log_debug(tgid, "threadgroup (= process) created");
return threadgroup;
}
void trace_free_process(struct Process *process)
{
process->status = PROCSTAT_FREE;
if(process->threadgroup != NULL)
{
process->threadgroup->refs--;
if(verbosity >= 3)
log_debug(process->tid,
"process died, threadgroup tgid=%d refs=%d",
process->threadgroup->tgid, process->threadgroup->refs);
if(process->threadgroup->refs == 0)
{
if(verbosity >= 3)
log_debug(process->threadgroup->tgid,
"deallocating threadgroup");
if(process->threadgroup->wd != NULL)
free(process->threadgroup->wd);
free(process->threadgroup);
}
process->threadgroup = NULL;
}
else if(verbosity >= 3)
log_debug(process->tid, "threadgroup==NULL");
if(process->execve_info != NULL)
{
free_execve_info(process->execve_info);
process->execve_info = NULL;
}
}
void trace_count_processes(unsigned int *p_nproc, unsigned int *p_unknown)
{
unsigned int nproc = 0, unknown = 0;
size_t i;
for(i = 0; i < processes_size; ++i)
{
switch(processes[i]->status)
{
case PROCSTAT_FREE:
break;
case PROCSTAT_UNKNOWN:
/* Exists but no corresponding syscall has returned yet */
++unknown;
case PROCSTAT_ALLOCATED:
/* Not yet attached but it will show up eventually */
case PROCSTAT_ATTACHED:
/* Running */
++nproc;
break;
}
}
if(p_nproc != NULL)
*p_nproc = nproc;
if(p_unknown != NULL)
*p_unknown = unknown;
}
int trace_add_files_from_proc(unsigned int process, pid_t tid,
const char *binary)
{
FILE *fp;
char dummy;
char *line = NULL;
size_t length = 0;
char previous_path[4096] = "";
const char *const fmt = "/proc/%d/maps";
int len = snprintf(&dummy, 1, fmt, tid);
char *procfile = malloc(len + 1);
snprintf(procfile, len + 1, fmt, tid);
/* Loops on lines
* Format:
* 08134000-0813a000 rw-p 000eb000 fe:00 868355 /bin/bash
* 0813a000-0813f000 rw-p 00000000 00:00 0
* b7721000-b7740000 r-xp 00000000 fe:00 901950 /lib/ld-2.18.so
* bfe44000-bfe65000 rw-p 00000000 00:00 0 [stack]
*/
#ifdef DEBUG_PROC_PARSER
log_info(tid, "parsing %s", procfile);
#endif
fp = fopen(procfile, "r");
free(procfile);
while((line = read_line(line, &length, fp)) != NULL)
{
unsigned long int addr_start, addr_end;
char perms[5];
unsigned long int offset;
unsigned int dev_major, dev_minor;
unsigned long int inode;
char pathname[4096];
sscanf(line,
"%lx-%lx %4s %lx %x:%x %lu %s",
&addr_start, &addr_end,
perms,
&offset,
&dev_major, &dev_minor,
&inode,
pathname);
#ifdef DEBUG_PROC_PARSER
log_info(tid,
"proc line:\n"
" addr_start: %lx\n"
" addr_end: %lx\n"
" perms: %s\n"
" offset: %lx\n"
" dev_major: %x\n"
" dev_minor: %x\n"
" inode: %lu\n"
" pathname: %s",
addr_start, addr_end,
perms,
offset,
dev_major, dev_minor,
inode,
pathname);
#endif
if(inode > 0)
{
if(strncmp(pathname, binary, 4096) != 0
&& strncmp(previous_path, pathname, 4096) != 0)
{
#ifdef DEBUG_PROC_PARSER
log_info(tid, " adding to database");
#endif
if(db_add_file_open(process, pathname,
FILE_READ, path_is_dir(pathname)) != 0)
return -1;
strncpy(previous_path, pathname, 4096);
}
}
}
fclose(fp);
return 0;
}
static void trace_set_options(pid_t tid)
{
ptrace(PTRACE_SETOPTIONS, tid, 0,
PTRACE_O_TRACESYSGOOD | /* Adds 0x80 bit to SIGTRAP signals
* if paused because of syscall */
#ifdef PTRACE_O_EXITKILL
PTRACE_O_EXITKILL |
#endif
PTRACE_O_TRACECLONE |
PTRACE_O_TRACEFORK |
PTRACE_O_TRACEVFORK |
PTRACE_O_TRACEEXEC);
}
static int trace(pid_t first_proc, int *first_exit_code)
{
for(;;)
{
int status;
pid_t tid;
struct Process *process;
/* Wait for a process */
tid = waitpid(-1, &status, __WALL);
if(tid == -1)
{
/* LCOV_EXCL_START : internal error: waitpid() won't fail unless we
* mistakingly call it while there is no child to wait for */
log_critical(0, "waitpid failed: %s", strerror(errno));
return -1;
/* LCOV_EXCL_END */
}
if(WIFEXITED(status) || WIFSIGNALED(status))
{
unsigned int nprocs, unknown;
int exitcode;
if(WIFSIGNALED(status))
/* exit codes are 8 bits */
exitcode = 0x0100 | WTERMSIG(status);
else
exitcode = WEXITSTATUS(status);
if(tid == first_proc && first_exit_code != NULL)
*first_exit_code = exitcode;
process = trace_find_process(tid);
if(process != NULL)
{
if(db_add_exit(process->identifier, exitcode) != 0)
return -1;
trace_free_process(process);
}
trace_count_processes(&nprocs, &unknown);
if(verbosity >= 2)
log_info(tid, "process exited (%s %d), %d processes remain",
(exitcode & 0x0100)?"signal":"code", exitcode & 0xFF,
(unsigned int)nprocs);
if(nprocs <= 0)
break;
if(unknown >= nprocs)
{
/* LCOV_EXCL_START : This can't happen because UNKNOWN
* processes are the forked processes whose creator has not
* returned yet. Therefore, if there is an UNKNOWN process, its
* creator has to exist as well (and it is not UNKNOWN). */
log_critical(0, "only UNKNOWN processes remaining (%d)",
(unsigned int)nprocs);
return -1;
/* LCOV_EXCL_END */
}
continue;
}
process = trace_find_process(tid);
if(process == NULL)
{
if(verbosity >= 3)
log_debug(tid, "process appeared");
process = trace_get_empty_process();
process->status = PROCSTAT_UNKNOWN;
process->flags = 0;
process->tid = tid;
process->threadgroup = NULL;
process->in_syscall = 0;
trace_set_options(tid);
/* Don't resume, it will be set to ATTACHED and resumed when fork()
* returns */
continue;
}
else if(process->status == PROCSTAT_ALLOCATED)
{
process->status = PROCSTAT_ATTACHED;
if(verbosity >= 3)
log_debug(tid, "process attached");
trace_set_options(tid);
ptrace(PTRACE_SYSCALL, tid, NULL, NULL);
if(verbosity >= 2)
{
unsigned int nproc, unknown;
trace_count_processes(&nproc, &unknown);
log_info(0, "%d processes (inc. %d unattached)",
nproc, unknown);
}
continue;
}
if(WIFSTOPPED(status) && WSTOPSIG(status) & 0x80)
{
size_t len = 0;
#ifdef I386
struct i386_regs regs;
#else /* def X86_64 */
struct x86_64_regs regs;
#endif
/* Try to use GETREGSET first, since iov_len allows us to know if
* 32bit or 64bit mode was used */
#ifdef PTRACE_GETREGSET
#ifndef NT_PRSTATUS
#define NT_PRSTATUS 1
#endif
{
struct iovec iov;
iov.iov_base = ®s;
iov.iov_len = sizeof(regs);
if(ptrace(PTRACE_GETREGSET, tid, NT_PRSTATUS, &iov) == 0)
len = iov.iov_len;
}
if(len == 0)
#endif
/* GETREGSET undefined or call failed, fallback on GETREGS */
{
/* LCOV_EXCL_START : GETREGSET was added by Linux 2.6.34 in
* May 2010 (2225a122) */
ptrace(PTRACE_GETREGS, tid, NULL, ®s);
/* LCOV_EXCL_END */
}
#if defined(I386)
if(!process->in_syscall)
process->current_syscall = regs.orig_eax;
if(process->in_syscall)
get_i386_reg(&process->retvalue, regs.eax);
else
{
get_i386_reg(&process->params[0], regs.ebx);
get_i386_reg(&process->params[1], regs.ecx);
get_i386_reg(&process->params[2], regs.edx);
get_i386_reg(&process->params[3], regs.esi);
get_i386_reg(&process->params[4], regs.edi);
get_i386_reg(&process->params[5], regs.ebp);
}
process->mode = MODE_I386;
#elif defined(X86_64)
/* On x86_64, process might be 32 or 64 bits */
/* If len is known (not 0) and not that of x86_64 registers,
* or if len is not known (0) and CS is 0x23 (not as reliable) */
if( (len != 0 && len != sizeof(regs))
|| (len == 0 && regs.cs == 0x23) )
{
/* 32 bit mode */
struct i386_regs *x86regs = (struct i386_regs*)®s;
if(!process->in_syscall)
process->current_syscall = x86regs->orig_eax;
if(process->in_syscall)
get_i386_reg(&process->retvalue, x86regs->eax);
else
{
get_i386_reg(&process->params[0], x86regs->ebx);
get_i386_reg(&process->params[1], x86regs->ecx);
get_i386_reg(&process->params[2], x86regs->edx);
get_i386_reg(&process->params[3], x86regs->esi);
get_i386_reg(&process->params[4], x86regs->edi);
get_i386_reg(&process->params[5], x86regs->ebp);
}
process->mode = MODE_I386;
}
else
{
/* 64 bit mode */
if(!process->in_syscall)
process->current_syscall = regs.orig_rax;
if(process->in_syscall)
get_x86_64_reg(&process->retvalue, regs.rax);
else
{
get_x86_64_reg(&process->params[0], regs.rdi);
get_x86_64_reg(&process->params[1], regs.rsi);
get_x86_64_reg(&process->params[2], regs.rdx);
get_x86_64_reg(&process->params[3], regs.r10);
get_x86_64_reg(&process->params[4], regs.r8);
get_x86_64_reg(&process->params[5], regs.r9);
}
/* Might still be either native x64 or Linux's x32 layer */
process->mode = MODE_X86_64;
}
#endif
if(syscall_handle(process) != 0)
return -1;
}
/* Handle signals */
else if(WIFSTOPPED(status))
{
int signum = WSTOPSIG(status) & 0x7F;
/* Synthetic signal for ptrace event: resume */
if(signum == SIGTRAP && status & 0xFF0000)
{
int event = status >> 16;
if(event == PTRACE_EVENT_EXEC)
{
log_debug(tid,
"got EVENT_EXEC, an execve() was successful and "
"will return soon");
if(syscall_execve_event(process) != 0)
return -1;
}
else if( (event == PTRACE_EVENT_FORK)
|| (event == PTRACE_EVENT_VFORK)
|| (event == PTRACE_EVENT_CLONE))
{
if(syscall_fork_event(process, event) != 0)
return -1;
}
ptrace(PTRACE_SYSCALL, tid, NULL, NULL);
}
else if(signum == SIGTRAP)
{
/* LCOV_EXCL_START : Processes shouldn't be getting SIGTRAPs */
log_error(0,
"NOT delivering SIGTRAP to %d\n"
" waitstatus=0x%X", tid, status);
ptrace(PTRACE_SYSCALL, tid, NULL, NULL);
/* LCOV_EXCL_END */
}
/* Other signal, let the process handle it */
else
{
siginfo_t si;
if(verbosity >= 2)
log_info(tid, "caught signal %d", signum);
if(ptrace(PTRACE_GETSIGINFO, tid, 0, (long)&si) >= 0)
ptrace(PTRACE_SYSCALL, tid, NULL, signum);
else
{
/* LCOV_EXCL_START : Not sure what this is for... doesn't
* seem to happen in practice */
log_error(tid, " NOT delivering: %s", strerror(errno));
if(signum != SIGSTOP)
ptrace(PTRACE_SYSCALL, tid, NULL, NULL);
/* LCOV_EXCL_END */
}
}
}
}
return 0;
}
static void (*python_sigchld_handler)(int) = NULL;
static void (*python_sigint_handler)(int) = NULL;
static void restore_signals(void)
{
if(python_sigchld_handler != NULL)
{
signal(SIGCHLD, python_sigchld_handler);
python_sigchld_handler = NULL;
}
if(python_sigint_handler != NULL)
{
signal(SIGINT, python_sigint_handler);
python_sigint_handler = NULL;
}
}
static void cleanup(void)
{
size_t i;
{
size_t nb = 0;
for(i = 0; i < processes_size; ++i)
if(processes[i]->status != PROCSTAT_FREE)
++nb;
/* size_t size is implementation dependent; %u for size_t can trigger
* a warning */
log_error(0, "cleaning up, %u processes to kill...", (unsigned int)nb);
}
for(i = 0; i < processes_size; ++i)
{
if(processes[i]->status != PROCSTAT_FREE)
{
kill(processes[i]->tid, SIGKILL);
trace_free_process(processes[i]);
}
}
}
static time_t last_int = 0;
static void sigint_handler(int signo)
{
time_t now = time(NULL);
(void)signo;
if(now - last_int < 2)
{
if(verbosity >= 1)
log_error(0, "cleaning up on SIGINT");
cleanup();
restore_signals();
exit(1);
}
else if(verbosity >= 1)
log_error(0, "Got SIGINT, press twice to abort...");
last_int = now;
}
static void trace_init(void)
{
/* Store Python's handlers for restore_signals() */
python_sigchld_handler = signal(SIGCHLD, SIG_DFL);
python_sigint_handler = signal(SIGINT, sigint_handler);
if(processes == NULL)
{
size_t i;
struct Process *pool;
processes_size = 16;
processes = malloc(processes_size * sizeof(*processes));
pool = malloc(processes_size * sizeof(*pool));
for(i = 0; i < processes_size; ++i)
{
processes[i] = pool++;
processes[i]->status = PROCSTAT_FREE;
processes[i]->threadgroup = NULL;
processes[i]->execve_info = NULL;
}
}
syscall_build_table();
}
int fork_and_trace(const char *binary, int argc, char **argv,
const char *database_path, int *exit_status)
{
pid_t child;
trace_init();
child = fork();
if(child != 0 && verbosity >= 2)
log_info(0, "child created, pid=%d", child);
if(child == 0)
{
char **args = malloc((argc + 1) * sizeof(char*));
memcpy(args, argv, argc * sizeof(char*));
args[argc] = NULL;
/* Trace this process */
if(ptrace(PTRACE_TRACEME, 0, NULL, NULL) != 0)
{
log_critical(
0,
"couldn't use ptrace: %s\n"
"This could be caused by a security policy or isolation "
"mechanism (such as\n Docker), see http://bit.ly/2bZd8Fa",
strerror(errno));
exit(1);
}
/* Stop this once so tracer can set options */
kill(getpid(), SIGSTOP);
/* Execute the target */
execvp(binary, args);
log_critical(0, "couldn't execute the target command (execvp "
"returned): %s", strerror(errno));
exit(1);
}
/* Open log file */
{
char logfilename[4096];
const char *home = getenv("HOME");
if(!home || !home[0])
{
log_critical(0, "couldn't open log file: $HOME not set");
restore_signals();
return 1;
}
strcpy(logfilename, home);
strcat(logfilename, "/.reprozip/log");
if(log_open_file(logfilename) != 0)
{
restore_signals();
return 1;
}
}
if(db_init(database_path) != 0)
{
kill(child, SIGKILL);
log_close_file();
restore_signals();
return 1;
}
/* Creates entry for first process */
{
struct Process *process = trace_get_empty_process();
process->status = PROCSTAT_ALLOCATED; /* Not yet attached... */
process->flags = 0;
/* We sent a SIGSTOP, but we resume on attach */
process->tid = child;
process->threadgroup = trace_new_threadgroup(child, get_wd());
process->in_syscall = 0;
if(verbosity >= 2)
log_info(0, "process %d created by initial fork()", child);
if( (db_add_first_process(&process->identifier,
process->threadgroup->wd) != 0)
|| (db_add_file_open(process->identifier, process->threadgroup->wd,
FILE_WDIR, 1) != 0) )
{
/* LCOV_EXCL_START : Database insertion shouldn't fail */
db_close(1);
cleanup();
log_close_file();
restore_signals();
return 1;
/* LCOV_EXCL_END */
}
}
if(trace(child, exit_status) != 0)
{
cleanup();
db_close(1);
log_close_file();
restore_signals();
return 1;
}
if(db_close(0) != 0)
{
log_close_file();
restore_signals();
return 1;
}
log_close_file();
restore_signals();
return 0;
}
reprozip-1.0.10/native/tracer.h 0000644 0000000 0000000 00000004320 13073250224 016315 0 ustar root root 0000000 0000000 #ifndef TRACER_H
#define TRACER_H
#include "config.h"
int fork_and_trace(const char *binary, int argc, char **argv,
const char *database_path, int *exit_status);
extern int trace_verbosity;
/* This is NOT a union because sign-extension rules depend on actual register
* sizes. */
typedef struct S_register_type {
signed long int i;
unsigned long int u;
void *p;
} register_type;
#define PROCESS_ARGS 6
struct ExecveInfo {
char *binary;
char **argv;
char **envp;
};
void free_execve_info(struct ExecveInfo *execi);
struct ThreadGroup {
pid_t tgid;
char *wd;
unsigned int refs;
};
struct Process {
unsigned int identifier;
unsigned int mode;
struct ThreadGroup *threadgroup;
pid_t tid;
int status;
unsigned int flags;
int in_syscall;
int current_syscall;
register_type retvalue;
register_type params[PROCESS_ARGS];
struct ExecveInfo *execve_info;
};
#define PROCSTAT_FREE 0 /* unallocated entry in table */
#define PROCSTAT_ALLOCATED 1 /* fork() done but not yet attached */
#define PROCSTAT_ATTACHED 2 /* running process */
#define PROCSTAT_UNKNOWN 3 /* attached but no corresponding fork() call
* has finished yet */
#define MODE_I386 1
#define MODE_X86_64 2 /* In x86_64 mode, syscalls might be native x64
* or x32 */
#define PROCFLAG_EXECD 1 /* Process is coming out of execve */
#define PROCFLAG_FORKING 2 /* Process is spawning another with
* fork/vfork/clone */
#define PROCFLAG_OPEN_EXIST 4 /* Process is opening a file that exists */
/* FIXME : This is only exposed because of execve() workaround */
extern struct Process **processes;
extern size_t processes_size;
struct Process *trace_find_process(pid_t tid);
struct Process *trace_get_empty_process(void);
struct ThreadGroup *trace_new_threadgroup(pid_t tgid, char *wd);
void trace_free_process(struct Process *process);
void trace_count_processes(unsigned int *p_nproc, unsigned int *p_unknown);
int trace_add_files_from_proc(unsigned int process, pid_t tid,
const char *binary);
#endif
reprozip-1.0.10/native/utils.c 0000644 0000000 0000000 00000007174 13050427552 016207 0 ustar root root 0000000 0000000 #include
#include
#include
#include
#include
#include
#include
#include
#include "config.h"
#include "database.h"
#include "log.h"
extern int trace_verbosity;
unsigned int flags2mode(int flags)
{
unsigned int mode = 0;
if(!O_RDONLY)
{
if(flags & O_WRONLY)
mode |= FILE_WRITE;
else if(flags & O_RDWR)
mode |= FILE_READ | FILE_WRITE;
else
mode |= FILE_READ;
}
else if(!O_WRONLY)
{
if(flags & O_RDONLY)
mode |= FILE_READ;
else if(flags & O_RDWR)
mode |= FILE_READ | FILE_WRITE;
else
mode |= FILE_WRITE;
}
else
{
if( (flags & (O_RDONLY | O_WRONLY)) == (O_RDONLY | O_WRONLY) )
log_error(0, "encountered bogus open() flags O_RDONLY|O_WRONLY");
/* Carry on anyway */
if(flags & O_RDONLY)
mode |= FILE_READ;
if(flags & O_WRONLY)
mode |= FILE_WRITE;
if(flags & O_RDWR)
mode |= FILE_READ | FILE_WRITE;
if( (mode & FILE_READ) && (mode & FILE_WRITE) && (flags & O_TRUNC) )
/* If O_TRUNC is set, consider this a write */
mode &= ~FILE_READ;
}
return mode;
}
char *abspath(const char *wd, const char *path)
{
size_t len_wd = strlen(wd);
if(wd[len_wd-1] == '/')
{
/* LCOV_EXCL_START : We usually get canonical path names, so we don't
* run into this one */
char *result = malloc(len_wd + strlen(path) + 1);
memcpy(result, wd, len_wd);
strcpy(result + len_wd, path);
return result;
/* LCOV_EXCL_END */
}
else
{
char *result = malloc(len_wd + 1 + strlen(path) + 1);
memcpy(result, wd, len_wd);
result[len_wd] = '/';
strcpy(result + len_wd + 1, path);
return result;
}
}
char *get_wd(void)
{
/* PATH_MAX has issues, don't use it */
size_t size = 1024;
char *path;
for(;;)
{
path = malloc(size);
if(getcwd(path, size) != NULL)
return path;
else
{
if(errno != ERANGE)
{
/* LCOV_EXCL_START : getcwd() really shouldn't fail */
free(path);
log_error(0, "getcwd failed: %s", strerror(errno));
return strdup("/UNKNOWN");
/* LCOV_EXCL_END */
}
free(path);
size <<= 1;
}
}
}
char *read_line(char *buffer, size_t *size, FILE *fp)
{
size_t pos = 0;
if(buffer == NULL)
{
*size = 4096;
buffer = malloc(*size);
}
for(;;)
{
char c;
{
int t = getc(fp);
if(t == EOF)
{
free(buffer);
return NULL;
}
c = t;
}
if(c == '\n')
{
buffer[pos] = '\0';
return buffer;
}
else
{
if(pos + 1 >= *size)
{
*size <<= 2;
buffer = realloc(buffer, *size);
}
buffer[pos++] = c;
}
}
}
int path_is_dir(const char *pathname)
{
struct stat buf;
if(lstat(pathname, &buf) != 0)
{
if(trace_verbosity >= 1)
{
/* LCOV_EXCL_START : shouldn't happen because a tracer process just
* accessed it */
log_error(0, "error stat()ing %s: %s", pathname, strerror(errno));
/* LCOV_EXCL_END */
}
return 0;
}
return S_ISDIR(buf.st_mode)?1:0;
}
reprozip-1.0.10/native/utils.h 0000644 0000000 0000000 00000000535 13017600314 016176 0 ustar root root 0000000 0000000 #ifndef UTILS_H
#define UTILS_H
#include
#include
#include
#include
#include
unsigned int flags2mode(int flags);
char *abspath(const char *wd, const char *path);
char *get_wd(void);
char *read_line(char *buffer, size_t *size, FILE *fp);
int path_is_dir(const char *pathname);
#endif
reprozip-1.0.10/reprozip/ 0000755 0000000 0000000 00000000000 13130663117 015254 5 ustar root root 0000000 0000000 reprozip-1.0.10/reprozip/__init__.py 0000644 0000000 0000000 00000000303 13127776450 017374 0 ustar root root 0000000 0000000 # Copyright (C) 2014-2017 New York University
# This file is part of ReproZip which is released under the Revised BSD License
# See file LICENSE for full license details.
__version__ = '1.0.10'
reprozip-1.0.10/reprozip/common.py 0000644 0000000 0000000 00000062572 13127776450 017145 0 ustar root root 0000000 0000000 # Copyright (C) 2014-2017 New York University
# This file is part of ReproZip which is released under the Revised BSD License
# See file LICENSE for full license details.
# This file is shared:
# reprozip/reprozip/common.py
# reprounzip/reprounzip/common.py
"""Common functions between reprozip and reprounzip.
This module contains functions that are specific to the reprozip software and
its data formats, but that are shared between the reprozip and reprounzip
packages. Because the packages can be installed separately, these functions are
in a separate module which is duplicated between the packages.
As long as these are small in number, they are not worth putting in a separate
package that reprozip and reprounzip would both depend on.
"""
from __future__ import division, print_function, unicode_literals
import atexit
import contextlib
import copy
from datetime import datetime
from distutils.version import LooseVersion
import functools
import logging
import logging.handlers
import os
from rpaths import PosixPath, Path
import sys
import tarfile
import usagestats
import yaml
from .utils import iteritems, itervalues, unicode_, stderr, UniqueNames, \
escape, CommonEqualityMixin, optional_return_type, hsize, join_root, \
copyfile
FILE_READ = 0x01
FILE_WRITE = 0x02
FILE_WDIR = 0x04
FILE_STAT = 0x08
FILE_LINK = 0x10
class File(CommonEqualityMixin):
"""A file, used at some point during the experiment.
"""
comment = None
def __init__(self, path, size=None):
self.path = path
self.size = size
def __eq__(self, other):
return (isinstance(other, File) and
self.path == other.path)
def __hash__(self):
return hash(self.path)
class Package(CommonEqualityMixin):
"""A distribution package, containing a set of files.
"""
def __init__(self, name, version, files=None, packfiles=True, size=None):
self.name = name
self.version = version
self.files = list(files) if files is not None else []
self.packfiles = packfiles
self.size = size
def add_file(self, file_):
self.files.append(file_)
def __unicode__(self):
return '%s (%s)' % (self.name, self.version)
__str__ = __unicode__
# Pack format history:
# 1: used by reprozip 0.2 through 0.7. Single tar.gz file, metadata under
# METADATA/, data under DATA/
# 2: pack is usually not compressed, metadata under METADATA/, data in another
# DATA.tar.gz (files inside it still have the DATA/ prefix for ease-of-use
# in unpackers)
#
# Pack metadata history:
# 0.2: used by reprozip 0.2
# 0.2.1:
# config: comments directories as such in config
# trace database: adds executed_files.workingdir, adds processes.exitcode
# data: packs dynamic linkers
# 0.3:
# config: don't list missing (unpacked) files in config
# trace database: adds opened_files.is_directory
# 0.3.1: no change
# 0.3.2: no change
# 0.4:
# config: adds input_files, output_files, lists parent directories
# 0.4.1: no change
# 0.5: no change
# 0.6: no change
# 0.7:
# moves input_files and output_files from run to global scope
# adds processes.is_thread column to trace database
# 0.8: adds 'id' field to run
class RPZPack(object):
"""Encapsulates operations on the RPZ pack format.
"""
def __init__(self, pack):
self.pack = Path(pack)
self.tar = tarfile.open(str(self.pack), 'r:*')
f = self.tar.extractfile('METADATA/version')
version = f.read()
f.close()
if version.startswith(b'REPROZIP VERSION '):
try:
version = int(version[17:].rstrip())
except ValueError:
version = None
if version in (1, 2):
self.version = version
self.data_prefix = PosixPath(b'DATA')
else:
raise ValueError(
"Unknown format version %r (maybe you should upgrade "
"reprounzip? I only know versions 1 and 2" % version)
else:
raise ValueError("File doesn't appear to be a RPZ pack")
if self.version == 1:
self.data = self.tar
elif version == 2:
self.data = tarfile.open(
fileobj=self.tar.extractfile('DATA.tar.gz'),
mode='r:*')
else:
assert False
def remove_data_prefix(self, path):
if not isinstance(path, PosixPath):
path = PosixPath(path)
components = path.components[1:]
if not components:
return path.__class__('')
return path.__class__(*components)
def open_config(self):
"""Gets the configuration file.
"""
return self.tar.extractfile('METADATA/config.yml')
def extract_config(self, target):
"""Extracts the config to the specified path.
It is up to the caller to remove that file once done.
"""
member = copy.copy(self.tar.getmember('METADATA/config.yml'))
member.name = str(target.components[-1])
self.tar.extract(member,
path=str(Path.cwd() / target.parent))
target.chmod(0o644)
assert target.is_file()
@contextlib.contextmanager
def with_config(self):
"""Context manager that extracts the config to a temporary file.
"""
fd, tmp = Path.tempfile(prefix='reprounzip_')
os.close(fd)
self.extract_config(tmp)
yield tmp
tmp.remove()
def extract_trace(self, target):
"""Extracts the trace database to the specified path.
It is up to the caller to remove that file once done.
"""
target = Path(target)
if self.version == 1:
member = self.tar.getmember('METADATA/trace.sqlite3')
elif self.version == 2:
try:
member = self.tar.getmember('METADATA/trace.sqlite3.gz')
except KeyError:
member = self.tar.getmember('METADATA/trace.sqlite3')
else:
assert False
member = copy.copy(member)
member.name = str(target.components[-1])
self.tar.extract(member,
path=str(Path.cwd() / target.parent))
target.chmod(0o644)
assert target.is_file()
@contextlib.contextmanager
def with_trace(self):
"""Context manager that extracts the trace database to a temporary file.
"""
fd, tmp = Path.tempfile(prefix='reprounzip_')
os.close(fd)
self.extract_trace(tmp)
yield tmp
tmp.remove()
def list_data(self):
"""Returns tarfile.TarInfo objects for all the data paths.
"""
return [copy.copy(m)
for m in self.data.getmembers()
if m.name.startswith('DATA/')]
def data_filenames(self):
"""Returns a set of filenames for all the data paths.
Those paths begin with a slash / and the 'DATA' prefix has been
removed.
"""
return set(PosixPath(m.name[4:])
for m in self.data.getmembers()
if m.name.startswith('DATA/'))
def get_data(self, path):
"""Returns a tarfile.TarInfo object for the data path.
Raises KeyError if no such path exists.
"""
path = PosixPath(path)
path = join_root(PosixPath(b'DATA'), path)
return copy.copy(self.data.getmember(path))
def extract_data(self, root, members):
"""Extracts the given members from the data tarball.
The members must come from get_data().
"""
self.data.extractall(str(root), members)
def copy_data_tar(self, target):
"""Copies the file in which the data lies to the specified destination.
"""
if self.version == 1:
self.pack.copyfile(target)
elif self.version == 2:
with target.open('wb') as fp:
data = self.tar.extractfile('DATA.tar.gz')
copyfile(data, fp)
data.close()
def close(self):
if self.data is not self.tar:
self.data.close()
self.tar.close()
self.data = self.tar = None
class InvalidConfig(ValueError):
"""Configuration file is invalid.
"""
def read_files(files, File=File):
if files is None:
return []
return [File(PosixPath(f)) for f in files]
def read_packages(packages, File=File, Package=Package):
if packages is None:
return []
new_pkgs = []
for pkg in packages:
pkg['files'] = read_files(pkg['files'], File)
new_pkgs.append(Package(**pkg))
return new_pkgs
Config = optional_return_type(['runs', 'packages', 'other_files'],
['inputs_outputs', 'additional_patterns',
'format_version'])
@functools.total_ordering
class InputOutputFile(object):
def __init__(self, path, read_runs, write_runs):
self.path = path
self.read_runs = read_runs
self.write_runs = write_runs
def __eq__(self, other):
return ((self.path, self.read_runs, self.write_runs) ==
(other.path, other.read_runs, other.write_runs))
def __lt__(self, other):
return self.path < other.path
def __repr__(self):
return "" % (
self.path, self.read_runs, self.write_runs)
def load_iofiles(config, runs):
"""Loads the inputs_outputs part of the configuration.
This tests for duplicates, merge the lists of executions, and optionally
loads from the runs for reprozip < 0.7 compatibility.
"""
files_list = config.get('inputs_outputs') or []
# reprozip < 0.7 compatibility: read input_files and output_files from runs
if 'inputs_outputs' not in config:
for i, run in enumerate(runs):
for rkey, wkey in (('input_files', 'read_by_runs'),
('output_files', 'written_by_runs')):
for k, p in iteritems(run.pop(rkey, {})):
files_list.append({'name': k,
'path': p,
wkey: [i]})
files = {} # name:str: InputOutputFile
paths = {} # path:PosixPath: name:str
required_keys = set(['name', 'path'])
optional_keys = set(['read_by_runs', 'written_by_runs'])
uniquenames = UniqueNames()
for i, f in enumerate(files_list):
keys = set(f)
if (not keys.issubset(required_keys | optional_keys) or
not keys.issuperset(required_keys)):
raise InvalidConfig("File #%d has invalid keys")
name = f['name']
path = PosixPath(f['path'])
readers = sorted(f.get('read_by_runs', []))
writers = sorted(f.get('written_by_runs', []))
if name in files:
if files[name].path != path:
old_name, name = name, uniquenames(name)
logging.warning("File name appears multiple times: %s\n"
"Using name %s instead",
old_name, name)
else:
uniquenames.insert(name)
if path in paths:
if paths[path] == name:
logging.warning("File appears multiple times: %s", name)
else:
logging.warning("Two files have the same path (but different "
"names): %s, %s\nUsing name %s",
name, paths[path], paths[path])
name = paths[path]
files[name].read_runs.update(readers)
files[name].write_runs.update(writers)
else:
paths[path] = name
files[name] = InputOutputFile(path, readers, writers)
return files
def load_config(filename, canonical, File=File, Package=Package):
"""Loads a YAML configuration file.
`File` and `Package` parameters can be used to override the classes that
will be used to hold files and distribution packages; useful during the
packing step.
`canonical` indicates whether a canonical configuration file is expected
(in which case the ``additional_patterns`` section is not accepted). Note
that this changes the number of returned values of this function.
"""
with filename.open(encoding='utf-8') as fp:
config = yaml.safe_load(fp)
ver = LooseVersion(config['version'])
keys_ = set(config)
if 'version' not in keys_:
raise InvalidConfig("Missing version")
# Accepts versions from 0.2 to 0.8 inclusive
elif not LooseVersion('0.2') <= ver < LooseVersion('0.9'):
pkgname = (__package__ or __name__).split('.', 1)[0]
raise InvalidConfig("Loading configuration file in unknown format %s; "
"this probably means that you should upgrade "
"%s" % (ver, pkgname))
unknown_keys = keys_ - set(['pack_id', 'version', 'runs',
'inputs_outputs',
'packages', 'other_files',
'additional_patterns',
# Deprecated
'input_files', 'output_files'])
if unknown_keys:
logging.warning("Unrecognized sections in configuration: %s",
', '.join(unknown_keys))
runs = config.get('runs') or []
packages = read_packages(config.get('packages'), File, Package)
other_files = read_files(config.get('other_files'), File)
inputs_outputs = load_iofiles(config, runs)
# reprozip < 0.7 compatibility: set inputs/outputs on runs (for plugins)
for i, run in enumerate(runs):
run['input_files'] = dict((n, f.path)
for n, f in iteritems(inputs_outputs)
if i in f.read_runs)
run['output_files'] = dict((n, f.path)
for n, f in iteritems(inputs_outputs)
if i in f.write_runs)
# reprozip < 0.8 compatibility: assign IDs to runs
for i, run in enumerate(runs):
if run.get('id') is None:
run['id'] = "run%d" % i
record_usage_package(runs, packages, other_files,
inputs_outputs,
pack_id=config.get('pack_id'))
kwargs = {'format_version': ver,
'inputs_outputs': inputs_outputs}
if canonical:
if 'additional_patterns' in config:
raise InvalidConfig("Canonical configuration file shouldn't have "
"additional_patterns key anymore")
else:
kwargs['additional_patterns'] = config.get('additional_patterns') or []
return Config(runs, packages, other_files,
**kwargs)
def write_file(fp, fi, indent=0):
fp.write("%s - \"%s\"%s\n" % (
" " * indent,
escape(unicode_(fi.path)),
' # %s' % fi.comment if fi.comment is not None else ''))
def write_package(fp, pkg, indent=0):
indent_str = " " * indent
fp.write("%s - name: \"%s\"\n" % (indent_str, escape(pkg.name)))
fp.write("%s version: \"%s\"\n" % (indent_str, escape(pkg.version)))
if pkg.size is not None:
fp.write("%s size: %d\n" % (indent_str, pkg.size))
fp.write("%s packfiles: %s\n" % (indent_str, 'true' if pkg.packfiles
else 'false'))
fp.write("%s files:\n"
"%s # Total files used: %s\n" % (
indent_str, indent_str,
hsize(sum(fi.size
for fi in pkg.files
if fi.size is not None))))
if pkg.size is not None:
fp.write("%s # Installed package size: %s\n" % (
indent_str, hsize(pkg.size)))
for fi in sorted(pkg.files, key=lambda fi_: fi_.path):
write_file(fp, fi, indent + 1)
def save_config(filename, runs, packages, other_files, reprozip_version,
inputs_outputs=None,
canonical=False, pack_id=None):
"""Saves the configuration to a YAML file.
`canonical` indicates whether this is a canonical configuration file
(no ``additional_patterns`` section).
"""
dump = lambda x: yaml.safe_dump(x, encoding='utf-8', allow_unicode=True)
with filename.open('w', encoding='utf-8', newline='\n') as fp:
# Writes preamble
fp.write("""\
# ReproZip configuration file
# This file was generated by reprozip {version} at {date}
{what}
# Run info{pack_id}
version: "{format!s}"
""".format(pack_id=(('\npack_id: "%s"' % pack_id) if pack_id is not None
else ''),
version=escape(reprozip_version),
format='0.8',
date=datetime.now().isoformat(),
what=("# It was generated by the packer and you shouldn't need to "
"edit it" if canonical
else "# You might want to edit this file before running the "
"packer\n# See 'reprozip pack -h' for help")))
fp.write("runs:\n")
for i, run in enumerate(runs):
# Remove reprozip < 0.7 compatibility fields
run = dict((k, v) for k, v in iteritems(run)
if k not in ('input_files', 'output_files'))
fp.write("# Run %d\n" % i)
fp.write(dump([run]).decode('utf-8'))
fp.write("\n")
fp.write("""\
# Input and output files
# Inputs are files that are only read by a run; reprounzip can replace these
# files on demand to run the experiment with custom data.
# Outputs are files that are generated by a run; reprounzip can extract these
# files from the experiment on demand, for the user to examine.
# The name field is the identifier the user will use to access these files.
inputs_outputs:""")
for n, f in iteritems(inputs_outputs):
fp.write("""\
- name: {name}
path: {path}
written_by_runs: {writers}
read_by_runs: {readers}""".format(name=n, path=unicode_(f.path),
readers=repr(f.read_runs),
writers=repr(f.write_runs)))
fp.write("""\
# Files to pack
# All the files below were used by the program; they will be included in the
# generated package
# These files come from packages; we can thus choose not to include them, as it
# will simply be possible to install that package on the destination system
# They are included anyway by default
packages:
""")
# Writes files
for pkg in sorted(packages, key=lambda p: p.name):
write_package(fp, pkg)
fp.write("""\
# These files do not appear to come with an installed package -- you probably
# want them packed
other_files:
""")
for f in sorted(other_files, key=lambda fi: fi.path):
write_file(fp, f)
if not canonical:
fp.write("""\
# If you want to include additional files in the pack, you can list additional
# patterns of files that will be included
additional_patterns:
# Example:
# - /etc/apache2/** # Everything under apache2/
# - /var/log/apache2/*.log # Log files directly under apache2/
# - /var/lib/lxc/*/rootfs/home/**/*.py # All Python files of all users in
# # that container
""")
class LoggingDateFormatter(logging.Formatter):
"""Formatter that puts milliseconds in the timestamp.
"""
converter = datetime.fromtimestamp
def formatTime(self, record, datefmt=None):
ct = self.converter(record.created)
t = ct.strftime("%H:%M:%S")
s = "%s.%03d" % (t, record.msecs)
return s
def setup_logging(tag, verbosity):
"""Sets up the logging module.
"""
levels = [logging.CRITICAL, logging.WARNING, logging.INFO, logging.DEBUG]
console_level = levels[min(verbosity, 3)]
file_level = logging.INFO
min_level = min(console_level, file_level)
# Create formatter, with same format as C extension
fmt = "[%s] %%(asctime)s %%(levelname)s: %%(message)s" % tag
formatter = LoggingDateFormatter(fmt)
# Console logger
handler = logging.StreamHandler()
handler.setLevel(console_level)
handler.setFormatter(formatter)
# Set up logger
logger = logging.root
logger.setLevel(min_level)
logger.addHandler(handler)
# File logger
dotrpz = Path('~/.reprozip').expand_user()
try:
if not dotrpz.is_dir():
dotrpz.mkdir()
filehandler = logging.handlers.RotatingFileHandler(str(dotrpz / 'log'),
mode='a',
delay=False,
maxBytes=400000,
backupCount=5)
except (IOError, OSError):
logging.warning("Couldn't create log file %s", dotrpz / 'log')
else:
filehandler.setFormatter(formatter)
filehandler.setLevel(file_level)
logger.addHandler(filehandler)
filehandler.emit(logging.root.makeRecord(
__name__.split('.', 1)[0],
logging.INFO,
"(log start)", 0,
"Log opened %s %s",
(datetime.now().strftime("%Y-%m-%d"), sys.argv),
None))
_usage_report = None
def setup_usage_report(name, version):
"""Sets up the usagestats module.
"""
global _usage_report
certificate_file = get_reprozip_ca_certificate()
_usage_report = usagestats.Stats(
'~/.reprozip/usage_stats',
usagestats.Prompt(enable='%s usage_report --enable' % name,
disable='%s usage_report --disable' % name),
os.environ.get('REPROZIP_USAGE_URL',
'https://stats.reprozip.org/'),
version='%s %s' % (name, version),
unique_user_id=True,
env_var='REPROZIP_USAGE_STATS',
ssl_verify=certificate_file.path)
try:
os.getcwd().encode('ascii')
except (UnicodeEncodeError, UnicodeDecodeError):
record_usage(cwd_ascii=False)
else:
record_usage(cwd_ascii=True)
def enable_usage_report(enable):
"""Enables or disables usage reporting.
"""
if enable:
_usage_report.enable_reporting()
stderr.write("Thank you, usage reports will be sent automatically "
"from now on.\n")
else:
_usage_report.disable_reporting()
stderr.write("Usage reports will not be collected nor sent.\n")
def record_usage(**kwargs):
"""Records some info in the current usage report.
"""
if _usage_report is not None:
_usage_report.note(kwargs)
def record_usage_package(runs, packages, other_files,
inputs_outputs,
pack_id=None):
"""Records the info on some pack file into the current usage report.
"""
if _usage_report is None:
return
for run in runs:
record_usage(argv0=run['argv'][0])
record_usage(pack_id=pack_id or '',
nb_packages=len(packages),
nb_package_files=sum(len(pkg.files)
for pkg in packages),
packed_packages=sum(1 for pkg in packages
if pkg.packfiles),
nb_other_files=len(other_files),
nb_input_outputs_files=len(inputs_outputs),
nb_input_files=sum(1 for f in itervalues(inputs_outputs)
if f.read_runs),
nb_output_files=sum(1 for f in itervalues(inputs_outputs)
if f.write_runs))
def submit_usage_report(**kwargs):
"""Submits the current usage report to the usagestats server.
"""
_usage_report.submit(kwargs,
usagestats.OPERATING_SYSTEM,
usagestats.SESSION_TIME,
usagestats.PYTHON_VERSION)
def get_reprozip_ca_certificate():
"""Gets the ReproZip CA certificate filename.
"""
fd, certificate_file = Path.tempfile(prefix='rpz_stats_ca_', suffix='.pem')
with certificate_file.open('wb') as fp:
fp.write(usage_report_ca)
os.close(fd)
atexit.register(os.remove, certificate_file.path)
return certificate_file
usage_report_ca = b'''\
-----BEGIN CERTIFICATE-----
MIIDzzCCAregAwIBAgIJAMmlcDnTidBEMA0GCSqGSIb3DQEBCwUAMH4xCzAJBgNV
BAYTAlVTMREwDwYDVQQIDAhOZXcgWW9yazERMA8GA1UEBwwITmV3IFlvcmsxDDAK
BgNVBAoMA05ZVTERMA8GA1UEAwwIUmVwcm9aaXAxKDAmBgkqhkiG9w0BCQEWGXJl
cHJvemlwLWRldkB2Z2MucG9seS5lZHUwHhcNMTQxMTA3MDUxOTA5WhcNMjQxMTA0
MDUxOTA5WjB+MQswCQYDVQQGEwJVUzERMA8GA1UECAwITmV3IFlvcmsxETAPBgNV
BAcMCE5ldyBZb3JrMQwwCgYDVQQKDANOWVUxETAPBgNVBAMMCFJlcHJvWmlwMSgw
JgYJKoZIhvcNAQkBFhlyZXByb3ppcC1kZXZAdmdjLnBvbHkuZWR1MIIBIjANBgkq
hkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA1fuTW2snrVji51vGVl9hXAAZbNJ+dxG+
/LOOxZrF2f1RRNy8YWpeCfGbsZqiIEjorBv8lvdd9P+tD3M5sh9L0zQPU9dFvDb+
OOrV0jx59hbK3QcCQju3YFuAtD1lu8TBIPgGEab0eJhLVIX+XU5cYXrfoBmwCpN/
1wXWkUhN91ZVMA0ylATAxTpnoNuMKzfTxT8pyOWajiTskYkKmVBAxgYJQe1YDFA8
fglBNkQuHqP8jgYAniEBCAPZRMMq8WpOtyFx+L9LX9/WcHtAQyDPPb9M81KKgPQq
urtCqtuDKxuqcX9zg4/O8l4nZ50pwaJjbH4kMW/wnLzTPvzZCPtJYQIDAQABo1Aw
TjAdBgNVHQ4EFgQUJjhDDOup4P0cdrAVq1F9ap3yTj8wHwYDVR0jBBgwFoAUJjhD
DOup4P0cdrAVq1F9ap3yTj8wDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQsFAAOC
AQEAeKpTiy2WYPqevHseTCJDIL44zghDJ9w5JmECOhFgPXR9Hl5Nh9S1j4qHBs4G
cn8d1p2+8tgcJpNAysjuSl4/MM6hQNecW0QVqvJDQGPn33bruMB4DYRT5du1Zpz1
YIKRjGU7Of3CycOCbaT50VZHhEd5GS2Lvg41ngxtsE8JKnvPuim92dnCutD0beV+
4TEvoleIi/K4AZWIaekIyqazd0c7eQjgSclNGgePcdbaxIo0u6tmdTYk3RNzo99t
DCfXxuMMg3wo5pbqG+MvTdECaLwt14zWU259z8JX0BoeVG32kHlt2eUpm5PCfxqc
dYuwZmAXksp0T0cWo0DnjJKRGQ==
-----END CERTIFICATE-----
'''
reprozip-1.0.10/reprozip/filters.py 0000644 0000000 0000000 00000002610 13073250224 017272 0 ustar root root 0000000 0000000 # Copyright (C) 2014-2017 New York University
# This file is part of ReproZip which is released under the Revised BSD License
# See file LICENSE for full license details.
from __future__ import division, print_function, unicode_literals
import logging
from reprozip.tracer.trace import TracedFile
from reprozip.utils import irange, iteritems
def builtin(input_files, **kwargs):
"""Default heuristics for input files.
"""
for i in irange(len(input_files)):
lst = []
for path in input_files[i]:
if path.unicodename[0] == '.' or path.ext in ('.pyc', '.so'):
logging.info("Removing input %s", path)
else:
lst.append(path)
input_files[i] = lst
def python(files, input_files, **kwargs):
add = []
for path, fi in iteritems(files):
if path.ext == '.pyc':
pyfile = path.parent / path.stem + '.py'
if pyfile.is_file():
if pyfile not in files:
logging.info("Adding %s", pyfile)
add.append(TracedFile(pyfile))
for fi in add:
files[fi.path] = fi
for i in irange(len(input_files)):
lst = []
for path in input_files[i]:
if path.ext in ('.py', '.pyc'):
logging.info("Removing input %s", path)
else:
lst.append(path)
input_files[i] = lst
reprozip-1.0.10/reprozip/main.py 0000644 0000000 0000000 00000032700 13127776450 016567 0 ustar root root 0000000 0000000 # Copyright (C) 2014-2017 New York University
# This file is part of ReproZip which is released under the Revised BSD License
# See file LICENSE for full license details.
"""Entry point for the reprozip utility.
This contains :func:`~reprozip.main.main`, which is the entry point declared to
setuptools. It is also callable directly.
It dispatchs to other routines, or handles the testrun command.
"""
from __future__ import division, print_function, unicode_literals
if __name__ == '__main__': # noqa
from reprozip.main import main
main()
import argparse
import locale
import logging
import os
from rpaths import Path
import sqlite3
import sys
from reprozip import __version__ as reprozip_version
from reprozip import _pytracer
from reprozip.common import setup_logging, \
setup_usage_report, enable_usage_report, \
submit_usage_report, record_usage
import reprozip.pack
import reprozip.tracer.trace
import reprozip.traceutils
from reprozip.utils import PY3, unicode_, stderr
safe_shell_chars = set("ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
"-+=/:.,%_")
def shell_escape(s):
r"""Given bl"a, returns "bl\\"a".
"""
if isinstance(s, bytes):
s = s.decode('utf-8')
if not s or any(c not in safe_shell_chars for c in s):
return '"%s"' % (s.replace('\\', '\\\\')
.replace('"', '\\"')
.replace('`', '\\`')
.replace('$', '\\$'))
else:
return s
def print_db(database):
"""Prints out database content.
"""
if PY3:
# On PY3, connect() only accepts unicode
conn = sqlite3.connect(str(database))
else:
conn = sqlite3.connect(database.path)
conn.row_factory = sqlite3.Row
conn.text_factory = lambda x: unicode_(x, 'utf-8', 'replace')
cur = conn.cursor()
rows = cur.execute(
'''
SELECT id, parent, timestamp, exitcode
FROM processes;
''')
print("\nProcesses:")
header = "+------+--------+-------+------------------+"
print(header)
print("| id | parent | exit | timestamp |")
print(header)
for r_id, r_parent, r_timestamp, r_exit in rows:
f_id = "{0: 5d} ".format(r_id)
if r_parent is not None:
f_parent = "{0: 7d} ".format(r_parent)
else:
f_parent = " "
if r_exit & 0x0100:
f_exit = " sig{0: <2d} ".format(r_exit)
else:
f_exit = " {0: <2d} ".format(r_exit)
f_timestamp = "{0: 17d} ".format(r_timestamp)
print('|'.join(('', f_id, f_parent, f_exit, f_timestamp, '')))
print(header)
cur.close()
cur = conn.cursor()
rows = cur.execute(
'''
SELECT id, name, timestamp, process, argv
FROM executed_files;
''')
print("\nExecuted files:")
header = ("+--------+------------------+---------+------------------------"
"---------------+")
print(header)
print("| id | timestamp | process | name and argv "
" |")
print(header)
for r_id, r_name, r_timestamp, r_process, r_argv in rows:
f_id = "{0: 7d} ".format(r_id)
f_timestamp = "{0: 17d} ".format(r_timestamp)
f_proc = "{0: 8d} ".format(r_process)
argv = r_argv.split('\0')
if not argv[-1]:
argv = argv[:-1]
cmdline = ' '.join(shell_escape(a) for a in argv)
if argv[0] not in (r_name, os.path.basename(r_name)):
cmdline = "(%s) %s" % (shell_escape(r_name), cmdline)
f_cmdline = " {0: <37s} ".format(cmdline)
print('|'.join(('', f_id, f_timestamp, f_proc, f_cmdline, '')))
print(header)
cur.close()
cur = conn.cursor()
rows = cur.execute(
'''
SELECT id, name, timestamp, mode, process
FROM opened_files;
''')
print("\nFiles:")
header = ("+--------+------------------+---------+------+-----------------"
"---------------+")
print(header)
print("| id | timestamp | process | mode | name "
" |")
print(header)
for r_id, r_name, r_timestamp, r_mode, r_process in rows:
f_id = "{0: 7d} ".format(r_id)
f_timestamp = "{0: 17d} ".format(r_timestamp)
f_proc = "{0: 8d} ".format(r_process)
f_mode = "{0: 5d} ".format(r_mode)
f_name = " {0: <30s} ".format(r_name)
print('|'.join(('', f_id, f_timestamp, f_proc, f_mode, f_name, '')))
print(header)
cur.close()
conn.close()
def testrun(args):
"""testrun subcommand.
Runs the command with the tracer using a temporary sqlite3 database, then
reads it and dumps it out.
Not really useful, except for debugging.
"""
fd, database = Path.tempfile(prefix='reprozip_', suffix='.sqlite3')
os.close(fd)
try:
if args.arg0 is not None:
argv = [args.arg0] + args.cmdline[1:]
else:
argv = args.cmdline
logging.debug("Starting tracer, binary=%r, argv=%r",
args.cmdline[0], argv)
c = _pytracer.execute(args.cmdline[0], argv, database.path,
args.verbosity)
print("\n\n-----------------------------------------------------------"
"--------------------")
print_db(database)
if c != 0:
if c & 0x0100:
print("\nWarning: program appears to have been terminated by "
"signal %d" % (c & 0xFF))
else:
print("\nWarning: program exited with non-zero code %d" % c)
finally:
database.remove()
def trace(args):
"""trace subcommand.
Simply calls reprozip.tracer.trace() with the arguments from argparse.
"""
if args.arg0 is not None:
argv = [args.arg0] + args.cmdline[1:]
else:
argv = args.cmdline
if args.append and args.overwrite:
logging.critical("You can't use both --continue and --overwrite")
sys.exit(2)
elif args.append:
append = True
elif args.overwrite:
append = False
else:
append = None
reprozip.tracer.trace.trace(args.cmdline[0],
argv,
Path(args.dir),
append,
args.verbosity)
reprozip.tracer.trace.write_configuration(Path(args.dir),
args.identify_packages,
args.find_inputs_outputs,
overwrite=False)
def reset(args):
"""reset subcommand.
Just regenerates the configuration (config.yml) from the trace
(trace.sqlite3).
"""
reprozip.tracer.trace.write_configuration(Path(args.dir),
args.identify_packages,
args.find_inputs_outputs,
overwrite=True)
def pack(args):
"""pack subcommand.
Reads in the configuration file and writes out a tarball.
"""
target = Path(args.target)
if not target.unicodename.lower().endswith('.rpz'):
target = Path(target.path + b'.rpz')
logging.warning("Changing output filename to %s", target.unicodename)
reprozip.pack.pack(target, Path(args.dir), args.identify_packages)
def combine(args):
"""combine subcommand.
Reads in multiple trace databases and combines them into one.
The runs from the original traces are appended ('run_id' field gets
translated to avoid conflicts).
"""
traces = []
for tracepath in args.traces:
if tracepath == '-':
tracepath = Path(args.dir) / 'trace.sqlite3'
else:
tracepath = Path(tracepath)
if tracepath.is_dir():
tracepath = tracepath / 'trace.sqlite3'
traces.append(tracepath)
reprozip.traceutils.combine_traces(traces, Path(args.dir))
reprozip.tracer.trace.write_configuration(Path(args.dir),
args.identify_packages,
args.find_inputs_outputs,
overwrite=True)
def usage_report(args):
if bool(args.enable) == bool(args.disable):
logging.critical("What do you want to do?")
sys.exit(2)
enable_usage_report(args.enable)
sys.exit(0)
def main():
"""Entry point when called on the command-line.
"""
# Locale
locale.setlocale(locale.LC_ALL, '')
# http://bugs.python.org/issue13676
# This prevents reprozip from reading argv and envp arrays from trace
if sys.version_info < (2, 7, 3):
stderr.write("Error: your version of Python, %s, is not supported\n"
"Versions before 2.7.3 are affected by bug 13676 and "
"will not work with ReproZip\n" %
sys.version.split(' ', 1)[0])
sys.exit(1)
# Parses command-line
# General options
def add_options(opt):
opt.add_argument('--version', action='version',
version="reprozip version %s" % reprozip_version)
opt.add_argument('-d', '--dir', default='.reprozip-trace',
help="where to store database and configuration file "
"(default: ./.reprozip-trace)")
opt.add_argument(
'--dont-identify-packages', action='store_false', default=True,
dest='identify_packages',
help="do not try identify which package each file comes from")
opt.add_argument(
'--dont-find-inputs-outputs', action='store_false',
default=True, dest='find_inputs_outputs',
help="do not try to identify input and output files")
parser = argparse.ArgumentParser(
description="reprozip is the ReproZip component responsible for "
"tracing and packing the execution of an experiment",
epilog="Please report issues to reprozip-users@vgc.poly.edu")
add_options(parser)
parser.add_argument('-v', '--verbose', action='count', default=1,
dest='verbosity',
help="augments verbosity level")
subparsers = parser.add_subparsers(title="commands", metavar='',
dest='selected_command')
# usage_report subcommand
parser_stats = subparsers.add_parser(
'usage_report',
help="Enables or disables anonymous usage reports")
add_options(parser_stats)
parser_stats.add_argument('--enable', action='store_true')
parser_stats.add_argument('--disable', action='store_true')
parser_stats.set_defaults(func=usage_report)
# trace command
parser_trace = subparsers.add_parser(
'trace',
help="Runs the program and writes out database and configuration file")
add_options(parser_trace)
parser_trace.add_argument(
'-a',
dest='arg0',
help="argument 0 to program, if different from program path")
parser_trace.add_argument(
'-c', '--continue', action='store_true', dest='append',
help="add to the previous trace, don't replace it")
parser_trace.add_argument(
'-w', '--overwrite', action='store_true', dest='overwrite',
help="overwrite the previous trace, don't add to it")
parser_trace.add_argument('cmdline', nargs=argparse.REMAINDER,
help="command-line to run under trace")
parser_trace.set_defaults(func=trace)
# testrun command
parser_testrun = subparsers.add_parser(
'testrun',
help="Runs the program and writes out the database contents")
add_options(parser_testrun)
parser_testrun.add_argument(
'-a',
dest='arg0',
help="argument 0 to program, if different from program path")
parser_testrun.add_argument('cmdline', nargs=argparse.REMAINDER)
parser_testrun.set_defaults(func=testrun)
# reset command
parser_reset = subparsers.add_parser(
'reset',
help="Resets the configuration file")
add_options(parser_reset)
parser_reset.set_defaults(func=reset)
# pack command
parser_pack = subparsers.add_parser(
'pack',
help="Packs the experiment according to the current configuration")
add_options(parser_pack)
parser_pack.add_argument('target', nargs=argparse.OPTIONAL,
default='experiment.rpz',
help="Destination file")
parser_pack.set_defaults(func=pack)
# combine command
parser_combine = subparsers.add_parser(
'combine',
help="Combine multiple traces into one (possibly as subsequent runs)")
add_options(parser_combine)
parser_combine.add_argument('traces', nargs=argparse.ONE_OR_MORE)
parser_combine.set_defaults(func=combine)
args = parser.parse_args()
setup_logging('REPROZIP', args.verbosity)
if getattr(args, 'func', None) is None:
parser.print_help(sys.stderr)
sys.exit(2)
setup_usage_report('reprozip', reprozip_version)
if 'cmdline' in args and not args.cmdline:
parser.error("missing command-line")
record_usage(command=args.selected_command)
try:
args.func(args)
except Exception as e:
submit_usage_report(result=type(e).__name__)
raise
else:
submit_usage_report(result='success')
sys.exit(0)
reprozip-1.0.10/reprozip/pack.py 0000644 0000000 0000000 00000017405 13127776450 016566 0 ustar root root 0000000 0000000 # Copyright (C) 2014-2017 New York University
# This file is part of ReproZip which is released under the Revised BSD License
# See file LICENSE for full license details.
"""Packing logic for reprozip.
This module contains the :func:`~reprozip.pack.pack` function and associated
utilities that are used to build the .rpz pack file from the trace SQLite file
and config YAML.
"""
from __future__ import division, print_function, unicode_literals
import itertools
import logging
import os
from rpaths import Path
import string
import sys
import tarfile
import uuid
from reprozip import __version__ as reprozip_version
from reprozip.common import File, load_config, save_config, \
record_usage_package
from reprozip.tracer.linux_pkgs import identify_packages
from reprozip.traceutils import combine_files
from reprozip.utils import iteritems
def expand_patterns(patterns):
files = set()
dirs = set()
# Finds all matching paths
for pattern in patterns:
if logging.root.isEnabledFor(logging.DEBUG):
logging.debug("Expanding pattern %r into %d paths",
pattern,
len(list(Path('/').recursedir(pattern))))
for path in Path('/').recursedir(pattern):
if path.is_dir():
dirs.add(path)
else:
files.add(path)
# Don't include directories whose files are included
non_empty_dirs = set([Path('/')])
for p in files | dirs:
path = Path('/')
for c in p.components[1:]:
path = path / c
non_empty_dirs.add(path)
# Builds the final list
return [File(p) for p in itertools.chain(dirs - non_empty_dirs, files)]
def canonicalize_config(packages, other_files, additional_patterns,
sort_packages):
"""Expands ``additional_patterns`` from the configuration file.
"""
if additional_patterns:
add_files = expand_patterns(additional_patterns)
logging.info("Found %d files from expanding additional_patterns...",
len(add_files))
if add_files:
if sort_packages:
add_files, add_packages = identify_packages(add_files)
else:
add_packages = []
other_files, packages = combine_files(add_files, add_packages,
other_files, packages)
return packages, other_files
def data_path(filename, prefix=Path('DATA')):
"""Computes the filename to store in the archive.
Turns an absolute path containing '..' into a filename without '..', and
prefixes with DATA/.
Example:
>>> data_path(PosixPath('/var/lib/../../../../tmp/test'))
PosixPath(b'DATA/tmp/test')
>>> data_path(PosixPath('/var/lib/../www/index.html'))
PosixPath(b'DATA/var/www/index.html')
"""
return prefix / filename.split_root()[1]
class PackBuilder(object):
"""Higher layer on tarfile that adds intermediate directories.
"""
def __init__(self, filename):
self.tar = tarfile.open(str(filename), 'w:gz')
self.seen = set()
def add_data(self, filename):
if filename in self.seen:
return
path = Path('/')
for c in filename.components[1:]:
path = path / c
if path in self.seen:
continue
logging.debug("%s -> %s", path, data_path(path))
self.tar.add(str(path), str(data_path(path)), recursive=False)
self.seen.add(path)
def close(self):
self.tar.close()
self.seen = None
def pack(target, directory, sort_packages):
"""Main function for the pack subcommand.
"""
if target.exists():
# Don't overwrite packs...
logging.critical("Target file exists!")
sys.exit(1)
# Reads configuration
configfile = directory / 'config.yml'
if not configfile.is_file():
logging.critical("Configuration file does not exist!\n"
"Did you forget to run 'reprozip trace'?\n"
"If not, you might want to use --dir to specify an "
"alternate location.")
sys.exit(1)
runs, packages, other_files = config = load_config(
configfile,
canonical=False)
additional_patterns = config.additional_patterns
inputs_outputs = config.inputs_outputs
# Validate run ids
run_chars = ('0123456789_-@() .:%'
'abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
for i, run in enumerate(runs):
if (any(c not in run_chars for c in run['id']) or
all(c in string.digits for c in run['id'])):
logging.critical("Illegal run id: %r (run number %d)",
run['id'], i)
sys.exit(1)
# Canonicalize config (re-sort, expand 'additional_files' patterns)
packages, other_files = canonicalize_config(
packages, other_files, additional_patterns, sort_packages)
logging.info("Creating pack %s...", target)
tar = tarfile.open(str(target), 'w:')
fd, tmp = Path.tempfile()
os.close(fd)
try:
datatar = PackBuilder(tmp)
# Add the files from the packages
for pkg in packages:
if pkg.packfiles:
logging.info("Adding files from package %s...", pkg.name)
files = []
for f in pkg.files:
if not Path(f.path).exists():
logging.warning("Missing file %s from package %s",
f.path, pkg.name)
else:
datatar.add_data(f.path)
files.append(f)
pkg.files = files
else:
logging.info("NOT adding files from package %s", pkg.name)
# Add the rest of the files
logging.info("Adding other files...")
files = set()
for f in other_files:
if not Path(f.path).exists():
logging.warning("Missing file %s", f.path)
else:
datatar.add_data(f.path)
files.add(f)
other_files = files
datatar.close()
tar.add(str(tmp), 'DATA.tar.gz')
finally:
tmp.remove()
logging.info("Adding metadata...")
# Stores pack version
fd, manifest = Path.tempfile(prefix='reprozip_', suffix='.txt')
os.close(fd)
try:
with manifest.open('wb') as fp:
fp.write(b'REPROZIP VERSION 2\n')
tar.add(str(manifest), 'METADATA/version')
finally:
manifest.remove()
# Stores the original trace
trace = directory / 'trace.sqlite3'
if not trace.is_file():
logging.critical("trace.sqlite3 is gone! Aborting")
sys.exit(1)
tar.add(str(trace), 'METADATA/trace.sqlite3')
# Checks that input files are packed
for name, f in iteritems(inputs_outputs):
if f.read_runs and not Path(f.path).exists():
logging.warning("File is designated as input (name %s) but is not "
"to be packed: %s", name, f.path)
# Generates a unique identifier for the pack (for usage reports purposes)
pack_id = str(uuid.uuid4())
# Stores canonical config
fd, can_configfile = Path.tempfile(suffix='.yml', prefix='rpz_config_')
os.close(fd)
try:
save_config(can_configfile, runs, packages, other_files,
reprozip_version,
inputs_outputs, canonical=True,
pack_id=pack_id)
tar.add(str(can_configfile), 'METADATA/config.yml')
finally:
can_configfile.remove()
tar.close()
# Record some info to the usage report
record_usage_package(runs, packages, other_files,
inputs_outputs,
pack_id)
reprozip-1.0.10/reprozip/tracer/ 0000755 0000000 0000000 00000000000 13130663117 016534 5 ustar root root 0000000 0000000 reprozip-1.0.10/reprozip/tracer/__init__.py 0000644 0000000 0000000 00000000000 13033760435 020636 0 ustar root root 0000000 0000000 reprozip-1.0.10/reprozip/tracer/linux_pkgs.py 0000644 0000000 0000000 00000021251 13073250224 021267 0 ustar root root 0000000 0000000 # Copyright (C) 2014-2017 New York University
# This file is part of ReproZip which is released under the Revised BSD License
# See file LICENSE for full license details.
"""Package identification routines.
This module contains the :func:`~reprozip.tracer.linux_pkgs.identify_packages`
function that sorts a list of files between their distribution packages,
depending on what Linux distribution we are running on.
Currently supported package managers:
- dpkg (Debian, Ubuntu)
"""
from __future__ import division, print_function, unicode_literals
import logging
import platform
from rpaths import Path
import subprocess
import time
from reprozip.common import Package
from reprozip.utils import iteritems, listvalues
magic_dirs = ('/dev', '/proc', '/sys')
system_dirs = ('/bin', '/etc', '/lib', '/sbin', '/usr', '/var')
class PkgManager(object):
"""Base class for package identifiers.
Subclasses should provide either `search_for_files` or `search_for_file`
which actually identifies the package for a file.
"""
def __init__(self):
# Files that were not part of a package
self.unknown_files = set()
# All the packages identified, with their `files` attribute set
self.packages = {}
def filter_files(self, files):
seen_files = set()
for f in files:
if f.path not in seen_files:
if not self._filter(f):
yield f
seen_files.add(f.path)
def search_for_files(self, files):
nb_pkg_files = 0
for f in self.filter_files(files):
pkgnames = self._get_packages_for_file(f.path)
# Stores the file
if not pkgnames:
self.unknown_files.add(f)
else:
pkgs = []
for pkgname in pkgnames:
if pkgname in self.packages:
pkgs.append(self.packages[pkgname])
else:
pkg = self._create_package(pkgname)
if pkg is not None:
self.packages[pkgname] = pkg
pkgs.append(self.packages[pkgname])
if len(pkgs) == 1:
pkgs[0].add_file(f)
nb_pkg_files += 1
else:
self.unknown_files.add(f)
# Filter out packages with no files
self.packages = {pkgname: pkg
for pkgname, pkg in iteritems(self.packages)
if pkg.files}
logging.info("%d packages with %d files, and %d other files",
len(self.packages),
nb_pkg_files,
len(self.unknown_files))
def _filter(self, f):
# Special files
if any(f.path.lies_under(c) for c in magic_dirs):
return True
# If it's not in a system directory, no need to look for it
if (f.path.lies_under('/usr/local') or
not any(f.path.lies_under(c) for c in system_dirs)):
self.unknown_files.add(f)
return True
return False
def _get_packages_for_file(self, filename):
raise NotImplementedError
def _create_package(self, pkgname):
raise NotImplementedError
class DpkgManager(PkgManager):
"""Package identifier for deb-based systems (Debian, Ubuntu).
"""
def search_for_files(self, files):
# Make a set of all the requested files
requested = dict((f.path, f) for f in self.filter_files(files))
found = {} # {path: pkgname}
# Process /var/lib/dpkg/info/*.list
for listfile in Path('/var/lib/dpkg/info').listdir():
pkgname = listfile.unicodename[:-5]
# Removes :arch
pkgname = pkgname.split(':', 1)[0]
if not listfile.unicodename.endswith('.list'):
continue
with listfile.open('rb') as fp:
# Read paths from the file
l = fp.readline()
while l:
if l[-1:] == b'\n':
l = l[:-1]
path = Path(l)
# If it's one of the requested paths
if path in requested:
# If we had assigned it to a package already, undo
if path in found:
found[path] = None
# Else assign to the package
else:
found[path] = pkgname
l = fp.readline()
# Remaining files are not from packages
self.unknown_files.update(
f for f in files
if f.path in requested and found.get(f.path) is None)
nb_pkg_files = 0
for path, pkgname in iteritems(found):
if pkgname is None:
continue
if pkgname in self.packages:
package = self.packages[pkgname]
else:
package = self._create_package(pkgname)
self.packages[pkgname] = package
package.add_file(requested.pop(path))
nb_pkg_files += 1
logging.info("%d packages with %d files, and %d other files",
len(self.packages),
nb_pkg_files,
len(self.unknown_files))
def _get_packages_for_file(self, filename):
# This method is no longer used for dpkg: instead of querying each file
# using `dpkg -S`, we read all the list files once ourselves since it
# is faster
assert False
def _create_package(self, pkgname):
p = subprocess.Popen(['dpkg-query',
'--showformat=${Package}\t'
'${Version}\t'
'${Installed-Size}\n',
'-W',
pkgname],
stdout=subprocess.PIPE)
try:
size = version = None
for l in p.stdout:
fields = l.split()
# Removes :arch
name = fields[0].decode('ascii').split(':', 1)[0]
if name == pkgname:
version = fields[1].decode('ascii')
size = int(fields[2].decode('ascii')) * 1024 # kbytes
break
for l in p.stdout: # finish draining stdout
pass
finally:
p.wait()
if p.returncode == 0:
pkg = Package(pkgname, version, size=size)
logging.debug("Found package %s", pkg)
return pkg
else:
return None
class RpmManager(PkgManager):
"""Package identifier for rpm-based systems (Fedora, CentOS).
"""
def _get_packages_for_file(self, filename):
p = subprocess.Popen(['rpm', '-qf', filename.path,
'--qf', '%{NAME}'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate()
if p.returncode != 0:
return None
return [l.strip().decode('iso-8859-1')
for l in out.splitlines()
if l]
def _create_package(self, pkgname):
p = subprocess.Popen(['rpm', '-q', pkgname,
'--qf', '%{VERSION}-%{RELEASE} %{SIZE}'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate()
if p.returncode == 0:
version, size = out.strip().decode('iso-8859-1').rsplit(' ', 1)
size = int(size)
pkg = Package(pkgname, version, size=size)
logging.debug("Found package %s", pkg)
return pkg
else:
return None
def identify_packages(files):
"""Organizes the files, using the distribution's package manager.
"""
distribution = platform.linux_distribution()[0].lower()
if distribution in ('debian', 'ubuntu'):
logging.info("Identifying Debian packages for %d files...", len(files))
manager = DpkgManager()
elif (distribution in ('centos', 'centos linux',
'fedora', 'scientific linux') or
distribution.startswith('red hat')):
logging.info("Identifying RPM packages for %d files...", len(files))
manager = RpmManager()
else:
logging.info("Unknown distribution, can't identify packages")
return files, []
begin = time.time()
manager.search_for_files(files)
logging.debug("Assigning files to packages took %f seconds",
(time.time() - begin))
return manager.unknown_files, listvalues(manager.packages)
reprozip-1.0.10/reprozip/tracer/trace.py 0000644 0000000 0000000 00000043364 13127776450 020231 0 ustar root root 0000000 0000000 # Copyright (C) 2014-2017 New York University
# This file is part of ReproZip which is released under the Revised BSD License
# See file LICENSE for full license details.
"""Tracing logic for reprozip.
This module contains the :func:`~reprozip.tracer.tracer.tracer` function that
invokes the C tracer (_pytracer) to build the SQLite trace file, and the
generation logic for the config YAML file.
"""
from __future__ import division, print_function, unicode_literals
from collections import defaultdict
from itertools import count
import logging
import os
from pkg_resources import iter_entry_points
import platform
from rpaths import Path
import sqlite3
import sys
from reprozip import __version__ as reprozip_version
from reprozip import _pytracer
from reprozip.common import File, InputOutputFile, load_config, save_config, \
FILE_READ, FILE_WRITE, FILE_LINK
from reprozip.tracer.linux_pkgs import magic_dirs, system_dirs, \
identify_packages
from reprozip.utils import PY3, izip, iteritems, itervalues, \
unicode_, flatten, UniqueNames, hsize, normalize_path, find_all_links
class TracedFile(File):
"""Override of `~reprozip.common.File` that reads stats from filesystem.
It also memorizes how files are used, to select files that are only read,
and accurately guess input and output files.
"""
# read
# +------+
# | |
# read v + write
# (init) +------------------> ONLY_READ +-------> READ_THEN_WRITTEN
# | ^ +
# | | |
# +-------> WRITTEN +--+ +---------+
# write ^ | read, write
# | |
# +---------+
# read, write
READ_THEN_WRITTEN = 0
ONLY_READ = 1
WRITTEN = 2
what = None
def __init__(self, path):
path = Path(path)
size = None
if path.exists():
if path.is_link():
self.comment = "Link to %s" % path.read_link(absolute=True)
elif path.is_dir():
self.comment = "Directory"
else:
size = path.size()
self.comment = hsize(size)
self.what = None
self.runs = defaultdict(lambda: None)
File.__init__(self, path, size)
def read(self, run):
if self.what is None:
self.what = TracedFile.ONLY_READ
if run is not None:
if self.runs[run] is None:
self.runs[run] = TracedFile.ONLY_READ
def write(self, run):
if self.what is None:
self.what = TracedFile.WRITTEN
elif self.what == TracedFile.ONLY_READ:
self.what = TracedFile.READ_THEN_WRITTEN
if run is not None:
if self.runs[run] is None:
self.runs[run] = TracedFile.WRITTEN
elif self.runs[run] == TracedFile.ONLY_READ:
self.runs[run] = TracedFile.READ_THEN_WRITTEN
def run_filter_plugins(files, input_files):
for entry_point in iter_entry_points('reprozip.filters'):
func = entry_point.load()
name = entry_point.name
logging.info("Running filter plugin %s", name)
func(files=files, input_files=input_files)
def get_files(conn):
"""Find all the files used by the experiment by reading the trace.
"""
files = {}
access_files = [set()]
# Finds run timestamps, so we can sort input/output files by run
proc_cursor = conn.cursor()
executions = proc_cursor.execute(
'''
SELECT timestamp
FROM processes
WHERE parent ISNULL
ORDER BY id;
''')
run_timestamps = [r_timestamp for r_timestamp, in executions][1:]
proc_cursor.close()
# Adds dynamic linkers
for libdir in (Path('/lib'), Path('/lib64')):
if libdir.exists():
for linker in libdir.listdir('*ld-linux*'):
for filename in find_all_links(linker, True):
if filename not in files:
f = TracedFile(filename)
f.read(None)
files[f.path] = f
# Loops on executed files, and opened files, at the same time
cur = conn.cursor()
rows = cur.execute(
'''
SELECT 'exec' AS event_type, name, NULL AS mode, timestamp
FROM executed_files
UNION ALL
SELECT 'open' AS event_type, name, mode, timestamp
FROM opened_files
ORDER BY timestamp;
''')
executed = set()
run = 0
for event_type, r_name, r_mode, r_timestamp in rows:
if event_type == 'exec':
r_mode = FILE_READ
r_name = Path(normalize_path(r_name))
# Stays on the current run
while run_timestamps and r_timestamp > run_timestamps[0]:
del run_timestamps[0]
access_files.append(set())
run += 1
# Adds symbolic links as read files
for filename in find_all_links(r_name.parent if r_mode & FILE_LINK
else r_name, False):
if filename not in files:
f = TracedFile(filename)
f.read(run)
files[f.path] = f
# Go to final target
if not r_mode & FILE_LINK:
r_name = r_name.resolve()
if event_type == 'exec':
executed.add(r_name)
if r_name not in files:
f = TracedFile(r_name)
files[f.path] = f
else:
f = files[r_name]
if r_mode & FILE_READ:
f.read(run)
if r_mode & FILE_WRITE:
f.write(run)
# Mark the parent directory as read
if r_name.parent not in files:
fp = TracedFile(r_name.parent)
fp.read(run)
files[fp.path] = fp
# Identifies input files
if r_name.is_file() and r_name not in executed:
access_files[-1].add(f)
cur.close()
# Further filters input files
inputs = [[fi.path
for fi in lst
# Input files are regular files,
if fi.path.is_file() and
# ONLY_READ,
fi.runs[r] == TracedFile.ONLY_READ and
# not executable,
# FIXME : currently disabled; only remove executed files
# not fi.path.stat().st_mode & 0b111 and
fi.path not in executed and
# not in a system directory
not any(fi.path.lies_under(m)
for m in magic_dirs + system_dirs)]
for r, lst in enumerate(access_files)]
# Identify output files
outputs = [[fi.path
for fi in lst
# Output files are regular files,
if fi.path.is_file() and
# WRITTEN
fi.runs[r] == TracedFile.WRITTEN and
# not in a system directory
not any(fi.path.lies_under(m)
for m in magic_dirs + system_dirs)]
for r, lst in enumerate(access_files)]
# Run the list of files through the filter plugins
run_filter_plugins(files, inputs)
# Files removed from plugins should be removed from inputs as well
inputs = [[path for path in lst if path in files]
for lst in inputs]
# Displays a warning for READ_THEN_WRITTEN files
read_then_written_files = [
fi
for fi in itervalues(files)
if fi.what == TracedFile.READ_THEN_WRITTEN and
not any(fi.path.lies_under(m) for m in magic_dirs)]
if read_then_written_files:
logging.warning(
"Some files were read and then written. We will only pack the "
"final version of the file; reproducible experiments shouldn't "
"change their input files")
logging.info("Paths:\n%s",
", ".join(unicode_(fi.path)
for fi in read_then_written_files))
files = set(
fi
for fi in itervalues(files)
if fi.what != TracedFile.WRITTEN and not any(fi.path.lies_under(m)
for m in magic_dirs))
return files, inputs, outputs
def tty_prompt(prompt, chars):
"""Get input from the terminal.
On Linux, this will find the controlling terminal and ask there.
:param prompt: String to be displayed on the terminal before reading the
input.
:param chars: Accepted character responses.
"""
try:
import termios
# Can't use O_RDWR/"w+" for a single fd/stream because of PY3 bug 20074
ofd = os.open('/dev/tty', os.O_WRONLY | os.O_NOCTTY)
ostream = os.fdopen(ofd, 'w', 1)
ifd = os.open('/dev/tty', os.O_RDONLY | os.O_NOCTTY)
istream = os.fdopen(ifd, 'r', 1)
old = termios.tcgetattr(ifd)
except (ImportError, AttributeError, IOError, OSError):
ostream = sys.stdout
istream = sys.stdin
if not os.isatty(sys.stdin.fileno()):
return None
while True:
ostream.write(prompt)
ostream.flush()
line = istream.readline()
if not line:
return None
elif line[0] in chars:
return line[0]
else:
new = old[:]
new[3] &= ~termios.ICANON # 3 == 'lflags'
tcsetattr_flags = termios.TCSAFLUSH | getattr(termios, 'TCSASOFT', 0)
try:
termios.tcsetattr(ifd, tcsetattr_flags, new)
ostream.write(prompt)
ostream.flush()
while True:
char = istream.read(1)
if char in chars:
ostream.write("\n")
return char
finally:
termios.tcsetattr(ifd, tcsetattr_flags, old)
ostream.flush()
def trace(binary, argv, directory, append, verbosity=1):
"""Main function for the trace subcommand.
"""
cwd = Path.cwd()
if (any(cwd.lies_under(c) for c in magic_dirs + system_dirs) and
not cwd.lies_under('/usr/local')):
logging.warning(
"You are running this experiment from a system directory! "
"Autodetection of non-system files will probably not work as "
"intended")
# Trace directory
if directory.exists():
if append is None:
r = tty_prompt(
"Trace directory %s exists\n"
"(a)ppend run to the trace, (d)elete it or (s)top? [a/d/s] " %
directory,
'aAdDsS')
if r is None:
logging.critical(
"Trace directory %s exists\n"
"Please use either --continue or --overwrite\n",
directory)
sys.exit(1)
elif r in 'sS':
sys.exit(1)
elif r in 'dD':
directory.rmtree()
directory.mkdir()
logging.warning(
"You can use --overwrite to replace the existing trace "
"(or --continue to append\nwithout prompt)")
elif append is False:
logging.info("Removing existing trace directory %s", directory)
directory.rmtree()
directory.mkdir(parents=True)
else:
if append is True:
logging.warning("--continue was set but trace doesn't exist yet")
directory.mkdir()
# Runs the trace
database = directory / 'trace.sqlite3'
logging.info("Running program")
# Might raise _pytracer.Error
c = _pytracer.execute(binary, argv, database.path, verbosity)
if c != 0:
if c & 0x0100:
logging.warning("Program appears to have been terminated by "
"signal %d", c & 0xFF)
else:
logging.warning("Program exited with non-zero code %d", c)
logging.info("Program completed")
def write_configuration(directory, sort_packages, find_inputs_outputs,
overwrite=False):
"""Writes the canonical YAML configuration file.
"""
database = directory / 'trace.sqlite3'
if PY3:
# On PY3, connect() only accepts unicode
conn = sqlite3.connect(str(database))
else:
conn = sqlite3.connect(database.path)
conn.row_factory = sqlite3.Row
# Reads info from database
files, inputs, outputs = get_files(conn)
# Identifies which file comes from which package
if sort_packages:
files, packages = identify_packages(files)
else:
packages = []
# Writes configuration file
config = directory / 'config.yml'
distribution = platform.linux_distribution()[0:2]
cur = conn.cursor()
if overwrite or not config.exists():
runs = []
# This gets all the top-level processes (p.parent ISNULL) and the first
# executed file for that process (sorting by ids, which are
# chronological)
executions = cur.execute(
'''
SELECT e.name, e.argv, e.envp, e.workingdir, p.exitcode
FROM processes p
JOIN executed_files e ON e.id=(
SELECT id FROM executed_files e2
WHERE e2.process=p.id
ORDER BY e2.id
LIMIT 1
)
WHERE p.parent ISNULL;
''')
else:
# Loads in previous config
runs, oldpkgs, oldfiles = load_config(config,
canonical=False,
File=TracedFile)
# Same query as previous block but only gets last process
executions = cur.execute(
'''
SELECT e.name, e.argv, e.envp, e.workingdir, p.exitcode
FROM processes p
JOIN executed_files e ON e.id=(
SELECT id FROM executed_files e2
WHERE e2.process=p.id
ORDER BY e2.id
LIMIT 1
)
WHERE p.parent ISNULL
ORDER BY p.id DESC
LIMIT 1;
''')
for r_name, r_argv, r_envp, r_workingdir, r_exitcode in executions:
# Decodes command-line
argv = r_argv.split('\0')
if not argv[-1]:
argv = argv[:-1]
# Decodes environment
envp = r_envp.split('\0')
if not envp[-1]:
envp = envp[:-1]
environ = dict(v.split('=', 1) for v in envp)
runs.append({'id': "run%d" % len(runs),
'binary': r_name, 'argv': argv,
'workingdir': unicode_(Path(r_workingdir)),
'architecture': platform.machine().lower(),
'distribution': distribution,
'hostname': platform.node(),
'system': [platform.system(), platform.release()],
'environ': environ,
'uid': os.getuid(),
'gid': os.getgid(),
'signal' if r_exitcode & 0x0100 else 'exitcode':
r_exitcode & 0xFF})
cur.close()
conn.close()
if find_inputs_outputs:
inputs_outputs = compile_inputs_outputs(runs, inputs, outputs)
else:
inputs_outputs = {}
save_config(config, runs, packages, files, reprozip_version,
inputs_outputs)
print("Configuration file written in {0!s}".format(config))
print("Edit that file then run the packer -- "
"use 'reprozip pack -h' for help")
def compile_inputs_outputs(runs, inputs, outputs):
"""Gives names to input/output files and creates InputOutputFile objects.
"""
# {path: (run_nb, arg_nb) or None}
runs_with_file = {}
# run_nb: number_of_file_arguments
nb_file_args = []
# {path: [runs]}
readers = {}
writers = {}
for run_nb, run, in_files, out_files in izip(count(), runs,
inputs, outputs):
# List which runs read or write each file
for p in in_files:
readers.setdefault(p, []).append(run_nb)
for p in out_files:
writers.setdefault(p, []).append(run_nb)
# Locate files that appear on a run's command line
files_set = set(in_files) | set(out_files)
nb_files = 0
for arg_nb, arg in enumerate(run['argv']):
p = Path(run['workingdir'], arg).resolve()
if p in files_set:
nb_files += 1
if p not in runs_with_file:
runs_with_file[p] = run_nb, arg_nb
elif runs_with_file[p] is not None:
runs_with_file[p] = None
nb_file_args.append(nb_files)
file_names = {}
make_unique = UniqueNames()
for fi in flatten(2, (inputs, outputs)):
if fi in file_names:
continue
# If it appears in at least one of the command-lines
if fi in runs_with_file:
# If it only appears once in the command-lines
if runs_with_file[fi] is not None:
run_nb, arg_nb = runs_with_file[fi]
parts = []
# Run number, if there are more than one runs
if len(runs) > 1:
parts.append(run_nb)
# Argument number, if there are more than one file arguments
if nb_file_args[run_nb] > 1:
parts.append(arg_nb)
file_names[fi] = make_unique(
'arg%s' % '_'.join('%s' % s for s in parts))
else:
file_names[fi] = make_unique('arg_%s' % fi.unicodename)
else:
file_names[fi] = make_unique(fi.unicodename)
return dict((n, InputOutputFile(p, readers.get(p, []), writers.get(p, [])))
for p, n in iteritems(file_names))
reprozip-1.0.10/reprozip/traceutils.py 0000644 0000000 0000000 00000016726 13127776450 020034 0 ustar root root 0000000 0000000 # Copyright (C) 2014-2017 New York University
# This file is part of ReproZip which is released under the Revised BSD License
# See file LICENSE for full license details.
"""Additional manipulations for traces.
These are operations on traces that are not directly related to the tracing
process itself.
"""
from __future__ import division, print_function, unicode_literals
import logging
import os
from rpaths import Path
import sqlite3
from reprozip.tracer.trace import TracedFile
from reprozip.utils import PY3, listvalues
def create_schema(conn):
"""Create the trace database schema on a given SQLite3 connection.
"""
sql = [
'''
CREATE TABLE processes(
id INTEGER NOT NULL PRIMARY KEY,
run_id INTEGER NOT NULL,
parent INTEGER,
timestamp INTEGER NOT NULL,
is_thread BOOLEAN NOT NULL,
exitcode INTEGER
);
''',
'''
CREATE INDEX proc_parent_idx ON processes(parent);
''',
'''
CREATE TABLE opened_files(
id INTEGER NOT NULL PRIMARY KEY,
run_id INTEGER NOT NULL,
name TEXT NOT NULL,
timestamp INTEGER NOT NULL,
mode INTEGER NOT NULL,
is_directory BOOLEAN NOT NULL,
process INTEGER NOT NULL
);
''',
'''
CREATE INDEX open_proc_idx ON opened_files(process);
''',
'''
CREATE TABLE executed_files(
id INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL,
run_id INTEGER NOT NULL,
timestamp INTEGER NOT NULL,
process INTEGER NOT NULL,
argv TEXT NOT NULL,
envp TEXT NOT NULL,
workingdir TEXT NOT NULL
);
''',
'''
CREATE INDEX exec_proc_idx ON executed_files(process);
''',
]
for stmt in sql:
conn.execute(stmt)
def combine_files(newfiles, newpackages, oldfiles, oldpackages):
"""Merges two sets of packages and files.
"""
files = set(oldfiles)
files.update(newfiles)
packages = dict((pkg.name, pkg) for pkg in newpackages)
for oldpkg in oldpackages:
if oldpkg.name in packages:
pkg = packages[oldpkg.name]
# Here we build TracedFiles from the Files so that the comment
# (size, etc) gets set
s = set(TracedFile(fi.path) for fi in oldpkg.files)
s.update(pkg.files)
oldpkg.files = list(s)
packages[oldpkg.name] = oldpkg
else:
oldpkg.files = [TracedFile(fi.path) for fi in oldpkg.files]
packages[oldpkg.name] = oldpkg
packages = listvalues(packages)
return files, packages
def combine_traces(traces, target):
"""Combines multiple trace databases into one.
The runs from the original traces are appended ('run_id' field gets
translated to avoid conflicts).
:param traces: List of trace database filenames.
:type traces: [Path]
:param target: Directory where to write the new database and associated
configuration file.
:type target: Path
"""
# We are probably overwriting one of the traces we're reading, so write to
# a temporary file first then move it
fd, output = Path.tempfile('.sqlite3', 'reprozip_combined_')
if PY3:
# On PY3, connect() only accepts unicode
conn = sqlite3.connect(str(output))
else:
conn = sqlite3.connect(output.path)
os.close(fd)
conn.row_factory = sqlite3.Row
# Create the schema
create_schema(conn)
# Temporary database with lookup tables
conn.execute(
'''
ATTACH DATABASE '' AS maps;
''')
conn.execute(
'''
CREATE TABLE maps.map_runs(
old INTEGER NOT NULL,
new INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT
);
''')
conn.execute(
'''
CREATE TABLE maps.map_processes(
old INTEGER NOT NULL,
new INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT
);
''')
# Do the merge
for other in traces:
logging.info("Attaching database %s", other)
# Attach the other trace
conn.execute(
'''
ATTACH DATABASE ? AS trace;
''',
(str(other),))
# Add runs to lookup table
conn.execute(
'''
INSERT INTO maps.map_runs(old)
SELECT DISTINCT run_id AS old
FROM trace.processes
ORDER BY run_id;
''')
logging.info(
"%d rows in maps.map_runs",
list(conn.execute('SELECT COUNT(*) FROM maps.map_runs;'))[0][0])
# Add processes to lookup table
conn.execute(
'''
INSERT INTO maps.map_processes(old)
SELECT id AS old
FROM trace.processes
ORDER BY id;
''')
logging.info(
"%d rows in maps.map_processes",
list(conn.execute('SELECT COUNT(*) FROM maps.map_processes;'))
[0][0])
# processes
logging.info("Insert processes...")
conn.execute(
'''
INSERT INTO processes(id, run_id, parent,
timestamp, is_thread, exitcode)
SELECT p.new AS id, r.new AS run_id, parent,
timestamp, is_thread, exitcode
FROM trace.processes t
INNER JOIN maps.map_runs r ON t.run_id = r.old
INNER JOIN maps.map_processes p ON t.id = p.old
ORDER BY t.id;
''')
# opened_files
logging.info("Insert opened_files...")
conn.execute(
'''
INSERT INTO opened_files(run_id, name, timestamp,
mode, is_directory, process)
SELECT r.new AS run_id, name, timestamp,
mode, is_directory, p.new AS process
FROM trace.opened_files t
INNER JOIN maps.map_runs r ON t.run_id = r.old
INNER JOIN maps.map_processes p ON t.process = p.old
ORDER BY t.id;
''')
# executed_files
logging.info("Insert executed_files...")
conn.execute(
'''
INSERT INTO executed_files(name, run_id, timestamp, process,
argv, envp, workingdir)
SELECT name, r.new AS run_id, timestamp, p.new AS process,
argv, envp, workingdir
FROM trace.executed_files t
INNER JOIN maps.map_runs r ON t.run_id = r.old
INNER JOIN maps.map_processes p ON t.process = p.old
ORDER BY t.id;
''')
# Flush maps
conn.execute(
'''
DELETE FROM maps.map_runs;
''')
conn.execute(
'''
DELETE FROM maps.map_processes;
''')
# An implicit transaction gets created. Python used to implicitly
# commit it, but no longer does as of 3.6, so we have to explicitly
# commit before detaching.
conn.commit()
# Detach
conn.execute(
'''
DETACH DATABASE trace;
''')
# See above.
conn.commit()
conn.execute(
'''
DETACH DATABASE maps;
''')
conn.commit()
conn.close()
# Move database to final destination
if not target.exists():
target.mkdir()
output.move(target / 'trace.sqlite3')
reprozip-1.0.10/reprozip/utils.py 0000644 0000000 0000000 00000031730 13127776450 017005 0 ustar root root 0000000 0000000 # Copyright (C) 2014-2017 New York University
# This file is part of ReproZip which is released under the Revised BSD License
# See file LICENSE for full license details.
# This file is shared:
# reprozip/reprozip/utils.py
# reprounzip/reprounzip/utils.py
"""Utility functions.
These functions are shared between reprozip and reprounzip but are not specific
to this software (more utilities).
"""
from __future__ import division, print_function, unicode_literals
import codecs
import contextlib
import email.utils
import itertools
import locale
import logging
import operator
import os
import requests
from rpaths import Path, PosixPath
import stat
import subprocess
import sys
class StreamWriter(object):
def __init__(self, stream):
writer = codecs.getwriter(locale.getpreferredencoding())
self._writer = writer(stream, 'replace')
self.buffer = stream
def writelines(self, lines):
self.write(str('').join(lines))
def write(self, obj):
if isinstance(obj, bytes):
self.buffer.write(obj)
else:
self._writer.write(obj)
def __getattr__(self, name,
getattr=getattr):
""" Inherit all other methods from the underlying stream.
"""
return getattr(self._writer, name)
PY3 = sys.version_info[0] == 3
if PY3:
izip = zip
irange = range
iteritems = lambda d: d.items()
itervalues = lambda d: d.values()
listvalues = lambda d: list(d.values())
stdout_bytes, stderr_bytes = sys.stdout.buffer, sys.stderr.buffer
stdin_bytes = sys.stdin.buffer
stdout, stderr = sys.stdout, sys.stderr
else:
izip = itertools.izip
irange = xrange # noqa: F821
iteritems = lambda d: d.iteritems()
itervalues = lambda d: d.itervalues()
listvalues = lambda d: d.values()
_writer = codecs.getwriter(locale.getpreferredencoding())
stdout_bytes, stderr_bytes = sys.stdout, sys.stderr
stdin_bytes = sys.stdin
stdout, stderr = StreamWriter(sys.stdout), StreamWriter(sys.stderr)
if PY3:
int_types = int,
unicode_ = str
else:
int_types = int, long # noqa: F821
unicode_ = unicode # noqa: F821
def flatten(n, l):
"""Flattens an iterable by repeatedly calling chain.from_iterable() on it.
>>> a = [[1, 2, 3], [4, 5, 6]]
>>> b = [[7, 8], [9, 10, 11, 12, 13, 14, 15, 16]]
>>> l = [a, b]
>>> list(flatten(0, a))
[[1, 2, 3], [4, 5, 6]]
>>> list(flatten(1, a))
[1, 2, 3, 4, 5, 6]
>>> list(flatten(1, l))
[[1, 2, 3], [4, 5, 6], [7, 8], [9, 10, 11, 12, 13, 14, 15, 16]]
>>> list(flatten(2, l))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
"""
for _ in irange(n):
l = itertools.chain.from_iterable(l)
return l
class UniqueNames(object):
"""Makes names unique amongst the ones it's already seen.
"""
def __init__(self):
self.names = set()
def insert(self, name):
assert name not in self.names
self.names.add(name)
def __call__(self, name):
nb = 1
attempt = name
while attempt in self.names:
nb += 1
attempt = '%s_%d' % (name, nb)
self.names.add(attempt)
return attempt
def escape(s):
"""Escapes backslashes and double quotes in strings.
This does NOT add quotes around the string.
"""
return s.replace('\\', '\\\\').replace('"', '\\"')
class CommonEqualityMixin(object):
"""Common mixin providing comparison by comparing ``__dict__`` attributes.
"""
def __eq__(self, other):
return (isinstance(other, self.__class__) and
self.__dict__ == other.__dict__)
def __ne__(self, other):
return not self.__eq__(other)
def optional_return_type(req_args, other_args):
"""Sort of namedtuple but with name-only fields.
When deconstructing a namedtuple, you have to get all the fields:
>>> o = namedtuple('T', ['a', 'b', 'c'])(1, 2, 3)
>>> a, b = o
ValueError: too many values to unpack
You thus cannot easily add new return values. This class allows it:
>>> o2 = optional_return_type(['a', 'b'], ['c'])(1, 2, 3)
>>> a, b = o2
>>> c = o2.c
"""
if len(set(req_args) | set(other_args)) != len(req_args) + len(other_args):
raise ValueError
# Maps argument name to position in each list
req_args_pos = dict((n, i) for i, n in enumerate(req_args))
other_args_pos = dict((n, i) for i, n in enumerate(other_args))
def cstr(cls, *args, **kwargs):
if len(args) > len(req_args) + len(other_args):
raise TypeError(
"Too many arguments (expected at least %d and no more than "
"%d)" % (len(req_args),
len(req_args) + len(other_args)))
args1, args2 = args[:len(req_args)], args[len(req_args):]
req = dict((i, v) for i, v in enumerate(args1))
other = dict(izip(other_args, args2))
for k, v in iteritems(kwargs):
if k in req_args_pos:
pos = req_args_pos[k]
if pos in req:
raise TypeError("Multiple values for field %s" % k)
req[pos] = v
elif k in other_args_pos:
if k in other:
raise TypeError("Multiple values for field %s" % k)
other[k] = v
else:
raise TypeError("Unknown field name %s" % k)
args = []
for i, k in enumerate(req_args):
if i not in req:
raise TypeError("Missing value for field %s" % k)
args.append(req[i])
inst = tuple.__new__(cls, args)
inst.__dict__.update(other)
return inst
dct = {'__new__': cstr}
for i, n in enumerate(req_args):
dct[n] = property(operator.itemgetter(i))
return type(str('OptionalReturnType'), (tuple,), dct)
def hsize(nbytes):
"""Readable size.
"""
if nbytes is None:
return "unknown"
KB = 1 << 10
MB = 1 << 20
GB = 1 << 30
TB = 1 << 40
PB = 1 << 50
nbytes = float(nbytes)
if nbytes < KB:
return "{0} bytes".format(nbytes)
elif nbytes < MB:
return "{0:.2f} KB".format(nbytes / KB)
elif nbytes < GB:
return "{0:.2f} MB".format(nbytes / MB)
elif nbytes < TB:
return "{0:.2f} GB".format(nbytes / GB)
elif nbytes < PB:
return "{0:.2f} TB".format(nbytes / TB)
else:
return "{0:.2f} PB".format(nbytes / PB)
def normalize_path(path):
"""Normalize a path obtained from the database.
"""
# For some reason, os.path.normpath() keeps multiple leading slashes
# We don't want this since it has no meaning on Linux
path = PosixPath(path)
if path.path.startswith(path._sep + path._sep):
path = PosixPath(path.path[1:])
return path
def find_all_links_recursive(filename, files):
path = Path('/')
for c in filename.components[1:]:
# At this point, path is a canonical path, and all links in it have
# been resolved
# We add the next path component
path = path / c
# That component is possibly a link
if path.is_link():
# Adds the link itself
files.add(path)
target = path.read_link(absolute=True)
# Here, target might contain a number of symlinks
if target not in files:
# Recurse on this new path
find_all_links_recursive(target, files)
# Restores the invariant; realpath might resolve several links here
path = path.resolve()
return path
def find_all_links(filename, include_target=False):
"""Dereferences symlinks from a path.
If include_target is True, this also returns the real path of the final
target.
Example:
/
a -> b
b
g -> c
c -> ../a/d
d
e -> /f
f
>>> find_all_links('/a/g/e', True)
['/a', '/b/c', '/b/g', '/b/d/e', '/f']
"""
files = set()
filename = Path(filename)
assert filename.absolute()
path = find_all_links_recursive(filename, files)
files = list(files)
if include_target:
files.append(path)
return files
def join_root(root, path):
"""Prepends `root` to the absolute path `path`.
"""
p_root, p_loc = path.split_root()
assert p_root == b'/'
return root / p_loc
@contextlib.contextmanager
def make_dir_writable(directory):
"""Context-manager that sets write permission on a directory.
This assumes that the directory belongs to you. If the u+w permission
wasn't set, it gets set in the context, and restored to what it was when
leaving the context. u+x also gets set on all the directories leading to
that path.
"""
uid = os.getuid()
try:
sb = directory.stat()
except OSError:
pass
else:
if sb.st_uid != uid or sb.st_mode & 0o700 == 0o700:
yield
return
# These are the permissions to be restored, in reverse order
restore_perms = []
try:
# Add u+x to all directories up to the target
path = Path('/')
for c in directory.components[1:-1]:
path = path / c
sb = path.stat()
if sb.st_uid == uid and not sb.st_mode & 0o100:
logging.debug("Temporarily setting u+x on %s", path)
restore_perms.append((path, sb.st_mode))
path.chmod(sb.st_mode | 0o700)
# Add u+wx to the target
sb = directory.stat()
if sb.st_uid == uid and sb.st_mode & 0o700 != 0o700:
logging.debug("Temporarily setting u+wx on %s", directory)
restore_perms.append((directory, sb.st_mode))
directory.chmod(sb.st_mode | 0o700)
yield
finally:
for path, mod in reversed(restore_perms):
path.chmod(mod)
def rmtree_fixed(path):
"""Like :func:`shutil.rmtree` but doesn't choke on annoying permissions.
If a directory with -w or -x is encountered, it gets fixed and deletion
continues.
"""
if path.is_link():
raise OSError("Cannot call rmtree on a symbolic link")
uid = os.getuid()
st = path.lstat()
if st.st_uid == uid and st.st_mode & 0o700 != 0o700:
path.chmod(st.st_mode | 0o700)
for entry in path.listdir():
if stat.S_ISDIR(entry.lstat().st_mode):
rmtree_fixed(entry)
else:
entry.remove()
path.rmdir()
# Compatibility with ReproZip <= 1.0.3
check_output = subprocess.check_output
def copyfile(source, destination, CHUNK_SIZE=4096):
"""Copies from one file object to another.
"""
while True:
chunk = source.read(CHUNK_SIZE)
if chunk:
destination.write(chunk)
if len(chunk) != CHUNK_SIZE:
break
def download_file(url, dest, cachename=None, ssl_verify=None):
"""Downloads a file using a local cache.
If the file cannot be downloaded or if it wasn't modified, the cached
version will be used instead.
The cache lives in ``~/.cache/reprozip/``.
"""
if cachename is None:
if dest is None:
raise ValueError("One of 'dest' or 'cachename' must be specified")
cachename = dest.components[-1]
headers = {}
if 'XDG_CACHE_HOME' in os.environ:
cache = Path(os.environ['XDG_CACHE_HOME'])
else:
cache = Path('~/.cache').expand_user()
cache = cache / 'reprozip' / cachename
if cache.exists():
mtime = email.utils.formatdate(cache.mtime(), usegmt=True)
headers['If-Modified-Since'] = mtime
cache.parent.mkdir(parents=True)
try:
response = requests.get(url, headers=headers,
timeout=2 if cache.exists() else 10,
stream=True, verify=ssl_verify)
response.raise_for_status()
if response.status_code == 304:
raise requests.HTTPError(
'304 File is up to date, no data returned',
response=response)
except requests.RequestException as e:
if cache.exists():
if e.response and e.response.status_code == 304:
logging.info("Download %s: cache is up to date", cachename)
else:
logging.warning("Download %s: error downloading %s: %s",
cachename, url, e)
if dest is not None:
cache.copy(dest)
return dest
else:
return cache
else:
raise
logging.info("Download %s: downloading %s", cachename, url)
try:
with cache.open('wb') as f:
for chunk in response.iter_content(4096):
f.write(chunk)
response.close()
except Exception as e: # pragma: no cover
try:
cache.remove()
except OSError:
pass
raise e
logging.info("Downloaded %s successfully", cachename)
if dest is not None:
cache.copy(dest)
return dest
else:
return cache
reprozip-1.0.10/reprozip.egg-info/ 0000755 0000000 0000000 00000000000 13130663117 016746 5 ustar root root 0000000 0000000 reprozip-1.0.10/reprozip.egg-info/PKG-INFO 0000644 0000000 0000000 00000004776 13130663116 020060 0 ustar root root 0000000 0000000 Metadata-Version: 1.1
Name: reprozip
Version: 1.0.10
Summary: Linux tool enabling reproducible experiments (packer)
Home-page: http://vida-nyu.github.io/reprozip/
Author: Remi Rampin
Author-email: remirampin@gmail.com
License: BSD-3-Clause
Description: ReproZip
========
`ReproZip `__ is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science. It tracks operating system calls and creates a package that contains all the binaries, files and dependencies required to run a given command on the author's computational environment (packing step). A reviewer can then extract the experiment in his environment to reproduce the results (unpacking step).
reprozip
--------
This is the component responsible for the packing step on Linux distributions.
Please refer to `reprounzip `_, `reprounzip-vagrant `_, and `reprounzip-docker `_ for other components and plugins.
Additional Information
----------------------
For more detailed information, please refer to our `website `_, as well as to our `documentation `_.
ReproZip is currently being developed at `NYU `_. The team includes:
* `Fernando Chirigati `_
* `Juliana Freire `_
* `Remi Rampin `_
* `Dennis Shasha `_
* `Vicky Steeves `_
Keywords: reprozip,reprounzip,reproducibility,provenance,vida,nyu
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: C
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: System :: Archiving
reprozip-1.0.10/reprozip.egg-info/SOURCES.txt 0000644 0000000 0000000 00000001265 13130663116 020635 0 ustar root root 0000000 0000000 LICENSE.txt
MANIFEST.in
README.rst
setup.py
native/config.h
native/database.c
native/database.h
native/log.c
native/log.h
native/ptrace_utils.c
native/ptrace_utils.h
native/pytracer.c
native/syscalls.c
native/syscalls.h
native/tracer.c
native/tracer.h
native/utils.c
native/utils.h
reprozip/__init__.py
reprozip/common.py
reprozip/filters.py
reprozip/main.py
reprozip/pack.py
reprozip/traceutils.py
reprozip/utils.py
reprozip.egg-info/PKG-INFO
reprozip.egg-info/SOURCES.txt
reprozip.egg-info/dependency_links.txt
reprozip.egg-info/entry_points.txt
reprozip.egg-info/requires.txt
reprozip.egg-info/top_level.txt
reprozip/tracer/__init__.py
reprozip/tracer/linux_pkgs.py
reprozip/tracer/trace.py reprozip-1.0.10/reprozip.egg-info/dependency_links.txt 0000644 0000000 0000000 00000000001 13130663116 023013 0 ustar root root 0000000 0000000
reprozip-1.0.10/reprozip.egg-info/entry_points.txt 0000644 0000000 0000000 00000000211 13130663116 022235 0 ustar root root 0000000 0000000 [console_scripts]
reprozip = reprozip.main:main
[reprozip.filters]
builtin = reprozip.filters:builtin
python = reprozip.filters:python
reprozip-1.0.10/reprozip.egg-info/requires.txt 0000644 0000000 0000000 00000000054 13130663116 021344 0 ustar root root 0000000 0000000 PyYAML
rpaths>=0.8
usagestats>=0.3
requests
reprozip-1.0.10/reprozip.egg-info/top_level.txt 0000644 0000000 0000000 00000000011 13130663116 021467 0 ustar root root 0000000 0000000 reprozip
reprozip-1.0.10/setup.cfg 0000644 0000000 0000000 00000000046 13130663117 015223 0 ustar root root 0000000 0000000 [egg_info]
tag_build =
tag_date = 0
reprozip-1.0.10/setup.py 0000644 0000000 0000000 00000005105 13130663041 015111 0 ustar root root 0000000 0000000 import io
import os
import platform
from setuptools import setup, Extension
import sys
# pip workaround
os.chdir(os.path.abspath(os.path.dirname(__file__)))
# This won't build on non-Linux -- don't even try
if platform.system().lower() != 'linux':
sys.stderr.write("reprozip uses ptrace and thus only works on Linux\n"
"You can however install reprounzip and plugins on other "
"platforms\n")
sys.exit(1)
# List the source files
sources = ['pytracer.c', 'tracer.c', 'syscalls.c', 'database.c',
'ptrace_utils.c', 'utils.c', 'log.c']
# They can be found under native/
sources = [os.path.join('native', n) for n in sources]
# Setup the libraries
libraries = ['sqlite3', 'rt']
# Build the C module
pytracer = Extension('reprozip._pytracer',
sources=sources,
libraries=libraries)
# Need to specify encoding for PY3, which has the worst unicode handling ever
with io.open('README.rst', encoding='utf-8') as fp:
description = fp.read()
req = [
'PyYAML',
'rpaths>=0.8',
'usagestats>=0.3',
'requests']
setup(name='reprozip',
version='1.0.10',
ext_modules=[pytracer],
packages=['reprozip', 'reprozip.tracer'],
entry_points={
'console_scripts': [
'reprozip = reprozip.main:main'],
'reprozip.filters': [
'python = reprozip.filters:python',
'builtin = reprozip.filters:builtin']},
install_requires=req,
description="Linux tool enabling reproducible experiments (packer)",
author="Remi Rampin, Fernando Chirigati, Dennis Shasha, Juliana Freire",
author_email='reprozip-users@vgc.poly.edu',
maintainer="Remi Rampin",
maintainer_email='remirampin@gmail.com',
url='http://vida-nyu.github.io/reprozip/',
long_description=description,
license='BSD-3-Clause',
keywords=['reprozip', 'reprounzip', 'reproducibility', 'provenance',
'vida', 'nyu'],
classifiers=[
'Development Status :: 5 - Production/Stable',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: BSD License',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Operating System :: POSIX :: Linux',
'Programming Language :: C',
'Topic :: Scientific/Engineering',
'Topic :: System :: Archiving'])