pax_global_header00006660000000000000000000000064136046455700014524gustar00rootroot0000000000000052 comment=f421e4788914e1625488d9b7d5aaab877eb1a940 zn_poly-0.9.2/000077500000000000000000000000001360464557000132265ustar00rootroot00000000000000zn_poly-0.9.2/.gitignore000066400000000000000000000004471360464557000152230ustar00rootroot00000000000000# configure outputs autom4te.cache config.log config.status makefile # automake outputs **/.deps **/.dirstamp # binary build outputs *.a *.dll *.dylib *.so *.tar.gz **/*.o tune/tune tune/tune.exe # libtools outputs *.l[oa] # other generated files src/tuning.c # editor temp files *.sw[pon] zn_poly-0.9.2/.gitlab-ci.yml000066400000000000000000000007411360464557000156640ustar00rootroot00000000000000image: gcc before_script: - apt-get update -qq && apt-get install -yqq libgmp-dev build: stage: build script: - ./configure --gmp-prefix=/usr - make artifacts: paths: - "makefile" - "src/tuning.c" expire_in: 1 week # run tests using the binary built before test: stage: test script: # Ignore prerequisites to src/tuning.c so that it doesn't get # unnecessarily rebuilt - make -o src/tuning.c check dependencies: - buildzn_poly-0.9.2/ABI_VERSION000066400000000000000000000000041360464557000147430ustar00rootroot000000000000000.9 zn_poly-0.9.2/CHANGES000066400000000000000000000062761360464557000142340ustar00rootroot00000000000000CHANGELOG (summary of changes for each release) version 0.9.2 (2019-01-06) ========================== * fixed unsafe use of printf [!1] * added support for Python 2.6 and Python 3 in the build system [!2, !3] version 0.9.1 (2018-10-04) ========================== * first new "release" in almost exactly 10 years! * new upstream sources at https://gitlab.com/sagemath/zn_poly * merged various minor downstream patches; specifically those from Sage * added support for building a DLL for zn_poly on Cygwin * other minor build cleanup; tuning is now done automatically by default, but can be disabled with `./configure --disable-tuning` version 0.9 (2008-10-22) ======================== (note: sage 3.1.3 includes a prerelease version of zn_poly 0.9, whose makefile is a bit different) * new features: * implemented "make check" * KS polynomial middle products * implemented basecase/karatsuba integer middle product at mpn level * automatic tuning for KS1/KS2/KS4/FFT middle products * zn_array_mulmid now never falls back on zn_array_mul * shared versioning .so library filenames for for Debianisation (thanks Timothy Abbott) * dylib64 target (thanks Michael Abshoff) * new zn_mod_get() function * bug fixes: * hopefully fixed a hard-to-reproduce bug where the cycle counter calibration code ludicrously overestimates the clockspeed (reported by Thomas Keller) * interface changes: * changed "midmul" everywhere to "mulmid" * other stuff: * rearranged directory structure * massive code reorganisation and reformatting * minor simplifications to pmfvec fft code version 0.8 (2008-04-04) ======================== * improved multiplication speed for odd moduli (via REDC modular reduction, and a few other tricks) * major rewrite of profiling/tuning code -- tuning is now much faster and more accurate * power series reciprocal via newton iteration (currently only efficient for high degree problems; currently only works for monic series) version 0.7 (2008-03-04) ======================== * specialised code for squaring (KS, nussbaumer, FFT) version 0.6 (2008-02-15) ======================== * middle products via Schonhage/Nussbaumer FFT * zn_array_midmul_fft_precomp1_t for preconditioned middle products * automatic tuning for KS vs FFT multiplication * made wide_arith.h a standalone file version 0.5 (2008-01-21) ======================== * Schonhage/Nussbaumer FFT multiplication code * example program: bernoulli numbers mod p * lots and lots of other things version 0.4.1 (2007-12-18) ========================== * fixed up warnings in tuning file for 32-bit machine * added .dylib and .so support to makefile version 0.4 =========== * added zn_mod_neg, zn_mod_mul, zn_mod_sub version 0.3 =========== * added zn_array_midmul() (stub for middle products) * added zn_array_copy() * added zn_poly_version_string() * "make install" now copies wide_arith.h as well as zn_poly.h; the dest directory is now /include/zn_poly, not /include version 0.2 =========== * automatic tuning for KS1 vs KS2 vs KS4 * cycle counting on powerpc * simple configure/build system * generic MUL_WIDE etc definitions version 0.1 =========== * initial release zn_poly-0.9.2/COPYING000066400000000000000000000040671360464557000142700ustar00rootroot00000000000000 =============================================================================== zn_poly: a library for polynomial arithmetic (version 0.9) Copyright (C) 2007, 2008, David Harvey Copyright (C) 2018 David Harvey, E. M. Bray This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License, along with this program (see gpl-2.0.txt and gpl-3.0.txt). If not, see . =============================================================================== Licensing notes: (1) zn_poly is NOT released under the "GPL v2 or later" or "GPL v3 or later". Both v2 and v3 are fine, but for now I am excluding later versions. If you need zn_poly under a different license, ask me and I'll consider it. (2) zn_poly incorporates small amounts of code from other projects: (2a) The file "wide_arith.h" includes some assembly macros from the file "longlong.h" in GMP 4.2.1; see http://gmplib.org/. The copyright to this code is held by the Free Software Foundation, and it was released under "LGPL v2.1 or later". (2b) The file "wide_arith.h" also includes assembly macros from the file "SPMM_ASM.h" in NTL 5.4.1; see http://www.shoup.net/ntl/. The copyright to this code is held by Victor Shoup, and it was released under "GPL v2 or later". (2c) The filer "profiler.h" contains x86 cycle counting code from the file "profiler.h" in FLINT 1.0; see http://www.flintlib.org/. The copyright to this code is held by William Hart, and it was released under "GPL v2 or later". =============================================================================== zn_poly-0.9.2/README000066400000000000000000000104601360464557000141070ustar00rootroot00000000000000 =============================================================================== zn_poly: a library for polynomial arithmetic (version 0.9) Copyright (C) 2007, 2008, David Harvey Copyright (C) 2018 David Harvey, E. M. Bray See the file COPYING for copyright and licensing information. =============================================================================== Installation instructions ------------------------- (1) Unpack the tarball somewhere. (2) From the tarball directory, run the configure script, i.e. ./configure This creates a makefile. Python must be available for this to work. The configure script doesn't do anything intelligent (like examine your system); it just writes the makefile based on the supplied options. Available options are the following: --cflags= Flags passed to gcc. Default is "-O2". You might need "-O2 -m64". --ldflags= Flags passed to linker. You might need "-m64", especially on some macs. --prefix= Where to put zn_poly header and library files. Default is "/usr/local". The header file is stored at /include. The library is stored at /lib. --gmp-prefix= Location of GMP include/library files (assumed to be under /include and /lib). Default is "/usr/local". --ntl-prefix= Location of NTL include/library files (assumed to be under /include and /lib). This is only necessary if you want to build the profiling targets with NTL profiling support. Default is "/usr/local". --use-flint Use the FLINT library instead of GMP for large integer multiplication. --flint-prefix= Location of FLINT include/library files (assumed to be under /include and /lib). Default is "/usr/local". --(en/dis)able-tuning Enable (the default) or disable automatic tuning. (3) (optional) Run "make tune" and then "tune/tune > src/tuning.c". This determines optimal tuning values on your platform and overwrites the tuning.c source file with the new values. This only works on platforms where cycle counter code is available (currently x86, x86_64 and powerpc). If you don't run "make tune", you'll just get some default tuning values that work well on my development machine. Note: Tuning is now always done by default as a prerequisite to other make targets; to disable it run ./configure with --disable-tuning (4) Run "make" to build the library. (5) Run "make install". This copies the library and include files to the requested destination directory. (6) (optional) Step (5) only installs the static version of the library. The makefile also has targets for shared libraries (you might need to add -fPIC to --cflags): * libzn_poly.dylib: for darwin * libzn_poly.dylib64: darwin, 64 bit * libzn_poly.so: linux * libzn_poly-0.9.so: linux, with sonames * cygzn_poly.dll: for cygwin =============================================================================== Other makefile targets ---------------------- make demo/bernoulli/bernoulli An example program for computing bernoulli numbers mod p. make clean Remove all temporary files. make distclean Remove all temporary files including the generated makefile. make check Runs some quick tests to make sure that everything appears to be working. make test Builds a test program. Running "test/test all" runs the whole zn_poly test suite, which tests zn_poly much more thoroughly than "make check". make profile/mul-profile A program for profiling various polynomial multiplication algorithms over various modulus sizes and polynomial lengths. make profile/mul-profile-ntl As above, but also includes support for profiling NTL's multiplication. make profile/invert-profile make profile/invert-profile-ntl A program for profiling series inversion. make profile/negamul-profile A program for profiling various negacyclic multiplication algorithms. make profile/mulmid-profile A program for profiling various middle product algorithms. make profile/array-profile A program for profiling linear-time array functions, including butterfly routines used in the FFT multiplication routines, GMP's mpn_add and mpn_sub, and others. =============================================================================== zn_poly-0.9.2/VERSION000066400000000000000000000000061360464557000142720ustar00rootroot000000000000000.9.2 zn_poly-0.9.2/configure000077500000000000000000000003261360464557000151360ustar00rootroot00000000000000#!/bin/sh # # This script just calls makemakefile.py. # See that file for the real configure script. # if test $# -ne 0 then python makemakefile.py "$@" > makefile else python makemakefile.py > makefile fi zn_poly-0.9.2/demo/000077500000000000000000000000001360464557000141525ustar00rootroot00000000000000zn_poly-0.9.2/demo/bernoulli/000077500000000000000000000000001360464557000161455ustar00rootroot00000000000000zn_poly-0.9.2/demo/bernoulli/bernoulli.c000066400000000000000000000541501360464557000203110ustar00rootroot00000000000000/* bernoulli.c: example program; computes irregular indices for a prime p Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include #include "zn_poly.h" /* Finds distinct prime factors of n, in the range k < p <= n. Assumes that n does not have any prime factors p <= k. Stores them in increasing order starting at res. Returns number of factors written. */ unsigned prime_factors_helper (ulong* res, ulong k, ulong n) { if (n == 1) return 0; ulong i; for (i = k + 1; i * i <= n; i++) { if (n % i == 0) { // found a factor *res = i; // remove that factor entirely for (n /= i; n % i == 0; n /= i); return prime_factors_helper (res + 1, i, n) + 1; } } // no more factors *res = n; return 1; } /* Finds distinct prime factors of n. Writes them to res in increasing order, returns number of factors found. */ unsigned prime_factors (ulong* res, ulong n) { return prime_factors_helper (res, 1, n); } /* A really dumb primality test. */ int is_prime (ulong n) { ulong i; for (i = 2; i * i <= n; i++) if (n % i == 0) return 0; return 1; } /* Assuming p >= 3 is prime, find the minimum primitive root mod p. Stores it at g, and stores its inverse mod p at g_inv. Note: this is probably a terrible algorithm, but it's fine compared to the running time of bernoulli() for large p. */ void primitive_root (ulong* g, ulong* g_inv, ulong p) { zn_mod_t mod; zn_mod_init (mod, p); // find prime factors of p-1 ulong factors[200]; unsigned num_factors, i; num_factors = prime_factors (factors, p - 1); // loop through candidates ulong x; for (x = 2; ; x++) { ZNP_ASSERT (x < p); // it's a generator if x^((p-1)/q) != 1 for all primes q dividing p-1 int good = 1; for (i = 0; i < num_factors && good; i++) good = good && (zn_mod_pow (x, (p - 1) / factors[i], mod) != 1); if (good) { *g = x; *g_inv = zn_mod_pow (x, p - 2, mod); zn_mod_clear (mod); return; } } } /* Computes bernoulli numbers B(0), B(2), ..., B((p-3)/2) mod p. If res != NULL, it stores the result in res, which needs to be of length (p-1)/2. If irregular != NULL, it stores the number of irregular indices at irregular[0], follows by the irregular indices. The largest permitted index of irregularity is irregular_max; if there are more indices than that to store, the program will abort. p must be a prime >= 3 p must be < 2^(ULONG_BITS/2) */ void bernoulli (ulong* res, ulong* irregular, unsigned irregular_max, ulong p) { ZNP_ASSERT (p < (1UL << (ULONG_BITS/2))); ulong g, g_inv; primitive_root (&g, &g_inv, p); ulong n = (p-1) / 2; zn_mod_t mod; zn_mod_init (mod, p); // allocate our own res if the caller didn't int own_res = 0; if (!res) { res = (ulong*) malloc (n * sizeof (ulong)); own_res = 1; } if (irregular) irregular[0] = 0; ulong* G = (ulong*) malloc (n * sizeof (ulong)); ulong* P = (ulong*) malloc ((2*n - 1) * sizeof (ulong)); ulong* J = res; // ------------------------------------------------------------------------- // Step 1: compute polynomials G(X) and J(X) // g_pow = g^(i-1), g_pow = g^(-i) at beginning of each iteration ulong g_pow = g_inv; ulong g_pow_inv = 1; // bias = (g-1)/2 mod p ulong bias = (g - 1 + ((g & 1) ? 0 : p)) / 2; // fudge = g^(i^2), fudge_inv = g^(-i^2) at each iteration ulong fudge = 1; ulong fudge_inv = 1; ulong i; for (i = 0; i < n; i++) { ulong prod = g * g_pow; // quo = floor(prod / p) // rem = prod % p ulong quo = zn_mod_quotient (prod, mod); ulong rem = prod - quo * p; // h = h(g^i) / g^i mod p ulong h = g_pow_inv * zn_mod_sub_slim (bias, quo, mod); h = zn_mod_reduce (h, mod); // update g_pow and g_pow_inv for next iteration g_pow = rem; g_pow_inv = zn_mod_reduce (g_pow_inv * g_inv, mod); // X^i coefficient of G(X) is g^(i^2) * h(g^i) / g^i G[i] = zn_mod_reduce (h * fudge, mod); // X^i coefficient of J(X) is g^(-i^2) J[i] = fudge_inv; // update fudge and fudge_inv for next iteration fudge = zn_mod_reduce (fudge * g_pow, mod); fudge = zn_mod_reduce (fudge * g_pow, mod); fudge = zn_mod_reduce (fudge * g, mod); fudge_inv = zn_mod_reduce (fudge_inv * g_pow_inv, mod); fudge_inv = zn_mod_reduce (fudge_inv * g_pow_inv, mod); fudge_inv = zn_mod_reduce (fudge_inv * g, mod); } J[0] = 0; // ------------------------------------------------------------------------- // Step 2: compute product P(X) = G(X) * J(X) zn_array_mul (P, J, n, G, n, mod); // ------------------------------------------------------------------------- // Step 3: extract output from P(X), and verify result res[0] = 1; // we will verify that \sum_{j=0}^{(p-3)/2} 4^j (2j+1) B(2j) = -2 mod p ulong check_accum = 1; ulong check_four_pow = 4 % p; // g_sqr = g^2 // g_sqr_inv = g^(-2) ulong g_sqr = zn_mod_reduce (g * g, mod); ulong g_sqr_inv = zn_mod_reduce (g_inv * g_inv, mod); // make table with J[i] = (1 - g^(2i+2))(1 - g^(2i+4)) ... (1 - g^(p-3)) // for 0 <= i < (p-1)/2 ulong g_sqr_inv_pow = g_sqr_inv; J[n-1] = 1; for (i = 1; i < n; i++) { J[n-i-1] = zn_mod_reduce (J[n-i] * (p + 1 - g_sqr_inv_pow), mod); g_sqr_inv_pow = zn_mod_reduce (g_sqr_inv_pow * g_sqr_inv, mod); } // fudge = g^(i^2) at each iteration fudge = g; // g_sqr_pow = g^(2i) at each iteration ulong g_sqr_pow = g_sqr; // prod_inv = [(1 - g^(2i))(1 - g^(2i+2)) ... (1 - g^(p-3))]^(-1) // at each iteration (todo: for i == 1, it's experimentally equal to -1/2 // mod p, need to prove this) ulong prod_inv = p - 2; for (i = 1; i < n; i++) { ulong val = (i == (n-1)) ? 0 : P[i + n]; if (n & 1) val = zn_mod_neg (val, mod); val = zn_mod_add_slim (val, G[i], mod); val = zn_mod_add_slim (val, P[i], mod); // multiply by 4 * i * g^(i^2) val = zn_mod_reduce (val * fudge, mod); val = zn_mod_reduce (val * (2*i), mod); val = zn_mod_add_slim (val, val, mod); // divide by (1 - g^(2i)) val = zn_mod_reduce (val * prod_inv, mod); val = zn_mod_reduce (val * J[i], mod); prod_inv = zn_mod_reduce (prod_inv * (1 + p - g_sqr_pow), mod); // store output coefficient if requested if (!own_res) res[i] = val; // store irregular index if requested if (irregular) { if (val == 0) { irregular[0]++; if (irregular[0] >= irregular_max) { printf ("too many irregular indices for p = %lu\n", p); abort (); } irregular[irregular[0]] = 2*i; } } // update fudge and g_sqr_pow g_sqr_pow = zn_mod_reduce (g_sqr_pow * g, mod); fudge = zn_mod_reduce (fudge * g_sqr_pow, mod); g_sqr_pow = zn_mod_reduce (g_sqr_pow * g, mod); // update verification data ulong check_term = zn_mod_reduce (check_four_pow * (2*i + 1), mod); check_term = zn_mod_reduce (check_term * val, mod); check_accum = zn_mod_add_slim (check_accum, check_term, mod); check_four_pow = zn_mod_add_slim (check_four_pow, check_four_pow, mod); check_four_pow = zn_mod_add_slim (check_four_pow, check_four_pow, mod); } if (check_accum != p-2) { printf ("bernoulli failed correctness check for p = %lu\n", p); abort (); } if (own_res) free (res); free (P); free (G); zn_mod_clear (mod); } /* Same as bernoulli(), but handles two primes simultaneously. p1 and p2 must be distinct primes >= 3. */ void bernoulli_dual(ulong* res1, ulong* irregular1, unsigned irregular1_max, ulong p1, ulong* res2, ulong* irregular2, unsigned irregular2_max, ulong p2) { ZNP_ASSERT (p1 < (1UL << (ULONG_BITS/2))); ZNP_ASSERT (p2 < (1UL << (ULONG_BITS/2))); ZNP_ASSERT (p1 != p2); // swap them to make p1 < p2 if (p1 > p2) { { ulong temp = p2; p2 = p1; p1 = temp; } { ulong* temp = res1; res1 = res2; res2 = temp; } { ulong* temp = irregular1; irregular1 = irregular2; irregular2 = temp; } { unsigned temp = irregular1_max; irregular1_max = irregular2_max; irregular2_max = temp; } } ulong g1, g_inv1; ulong g2, g_inv2; primitive_root (&g1, &g_inv1, p1); primitive_root (&g2, &g_inv2, p2); ulong n1 = (p1-1) / 2; ulong n2 = (p2-1) / 2; zn_mod_t mod1, mod2; zn_mod_init (mod1, p1); zn_mod_init (mod2, p2); ulong q = p1 * p2; zn_mod_t mod; zn_mod_init (mod, q); // allocate our own res2 if the caller didn't int own_res2 = 0; if (!res2) { res2 = (ulong*) malloc (n2 * sizeof (ulong)); own_res2 = 1; } if (irregular1) irregular1[0] = 0; if (irregular2) irregular2[0] = 0; // find idempotents to CRT modulo p1 and p2, i.e. // id1 = 1 mod p1, id1 = 0 mod p2 // id2 = 0 mod p1, id2 = 1 mod p2 mpz_t p1_mpz, p2_mpz, a1_mpz, a2_mpz, g_mpz; mpz_init (p1_mpz); mpz_init (p2_mpz); mpz_init (a1_mpz); mpz_init (a2_mpz); mpz_init (g_mpz); mpz_set_ui (p1_mpz, p1); mpz_set_ui (p2_mpz, p2); mpz_gcdext (g_mpz, a1_mpz, a2_mpz, p1_mpz, p2_mpz); mpz_mul (a1_mpz, a1_mpz, p1_mpz); mpz_mod_ui (a1_mpz, a1_mpz, q); ulong id2 = mpz_get_ui (a1_mpz); mpz_mul (a2_mpz, a2_mpz, p2_mpz); mpz_mod_ui (a2_mpz, a2_mpz, q); ulong id1 = mpz_get_ui (a2_mpz); mpz_clear (g_mpz); mpz_clear (a2_mpz); mpz_clear (a1_mpz); mpz_clear (p2_mpz); mpz_clear (p1_mpz); ulong* G = (ulong*) malloc (n2 * sizeof (ulong)); ulong* P = (ulong*) malloc ((2*n2 - 1) * sizeof (ulong)); ulong* J = res2; // ------------------------------------------------------------------------- // Step 1: compute polynomials G(X) and J(X) // g_pow = g^(i-1), g_pow = g^(-i) at beginning of each iteration ulong g_pow1 = g_inv1; ulong g_pow2 = g_inv2; ulong g_pow_inv1 = 1; ulong g_pow_inv2 = 1; // bias = (g-1)/2 mod p ulong bias1 = (g1 - 1 + ((g1 & 1) ? 0 : p1)) / 2; ulong bias2 = (g2 - 1 + ((g2 & 1) ? 0 : p2)) / 2; // fudge = g^(i^2), fudge_inv = g^(-i^2) at each iteration ulong fudge1 = 1; ulong fudge2 = 1; ulong fudge_inv1 = 1; ulong fudge_inv2 = 1; ulong i; for (i = 0; i < n1; i++) { ulong prod1 = g1 * g_pow1; ulong prod2 = g2 * g_pow2; // quo = floor(prod / p) // rem = prod % p ulong quo1 = zn_mod_quotient (prod1, mod1); ulong rem1 = prod1 - quo1 * p1; ulong quo2 = zn_mod_quotient (prod2, mod2); ulong rem2 = prod2 - quo2 * p2; // h = h(g^i) / g^i mod p ulong h1 = g_pow_inv1 * zn_mod_sub_slim (bias1, quo1, mod1); h1 = zn_mod_reduce (h1, mod1); ulong h2 = g_pow_inv2 * zn_mod_sub_slim (bias2, quo2, mod2); h2 = zn_mod_reduce (h2, mod2); // update g_pow and g_pow_inv for next iteration g_pow1 = rem1; g_pow_inv1 = zn_mod_reduce (g_pow_inv1 * g_inv1, mod1); g_pow2 = rem2; g_pow_inv2 = zn_mod_reduce (g_pow_inv2 * g_inv2, mod2); // X^i coefficient of G(X) is g^(i^2) * h(g^i) / g^i // (combine via CRT) ulong Gval1 = zn_mod_reduce (h1 * fudge1, mod1); Gval1 = zn_mod_mul (Gval1, id1, mod); ulong Gval2 = zn_mod_reduce (h2 * fudge2, mod2); Gval2 = zn_mod_mul (Gval2, id2, mod); G[i] = zn_mod_add (Gval1, Gval2, mod); // X^i coefficient of J(X) is g^(-i^2) ulong Jval1 = zn_mod_mul (fudge_inv1, id1, mod); ulong Jval2 = zn_mod_mul (fudge_inv2, id2, mod); J[i] = zn_mod_add (Jval1, Jval2, mod); // update fudge and fudge_inv for next iteration fudge1 = zn_mod_reduce (fudge1 * g_pow1, mod1); fudge1 = zn_mod_reduce (fudge1 * g_pow1, mod1); fudge1 = zn_mod_reduce (fudge1 * g1, mod1); fudge_inv1 = zn_mod_reduce (fudge_inv1 * g_pow_inv1, mod1); fudge_inv1 = zn_mod_reduce (fudge_inv1 * g_pow_inv1, mod1); fudge_inv1 = zn_mod_reduce (fudge_inv1 * g1, mod1); fudge2 = zn_mod_reduce (fudge2 * g_pow2, mod2); fudge2 = zn_mod_reduce (fudge2 * g_pow2, mod2); fudge2 = zn_mod_reduce (fudge2 * g2, mod2); fudge_inv2 = zn_mod_reduce (fudge_inv2 * g_pow_inv2, mod2); fudge_inv2 = zn_mod_reduce (fudge_inv2 * g_pow_inv2, mod2); fudge_inv2 = zn_mod_reduce (fudge_inv2 * g2, mod2); } // finished with p1, now finish the loop for p2 for (; i < n2; i++) { ulong prod2 = g2 * g_pow2; // quo = floor(prod / p) // rem = prod % p ulong quo2 = zn_mod_quotient (prod2, mod2); ulong rem2 = prod2 - quo2 * p2; // h = h(g^i) / g^i mod p ulong h2 = g_pow_inv2 * zn_mod_sub_slim (bias2, quo2, mod2); h2 = zn_mod_reduce (h2, mod2); // update g_pow and g_pow_inv for next iteration g_pow2 = rem2; g_pow_inv2 = zn_mod_reduce (g_pow_inv2 * g_inv2, mod2); // X^i coefficient of G(X) is g^(i^2) * h(g^i) / g^i // (combine via CRT) ulong Gval2 = zn_mod_reduce (h2 * fudge2, mod2); G[i] = zn_mod_mul (Gval2, id2, mod); // X^i coefficient of J(X) is g^(-i^2) J[i] = zn_mod_mul (fudge_inv2, id2, mod); // update fudge and fudge_inv for next iteration fudge2 = zn_mod_reduce (fudge2 * g_pow2, mod2); fudge2 = zn_mod_reduce (fudge2 * g_pow2, mod2); fudge2 = zn_mod_reduce (fudge2 * g2, mod2); fudge_inv2 = zn_mod_reduce (fudge_inv2 * g_pow_inv2, mod2); fudge_inv2 = zn_mod_reduce (fudge_inv2 * g_pow_inv2, mod2); fudge_inv2 = zn_mod_reduce (fudge_inv2 * g2, mod2); } J[0] = 0; // ------------------------------------------------------------------------- // Step 2: compute product P(X) = G(X) * J(X) #if 0 zn_array_mul_fft_dft (P, J, n2, G, n2, 3, mod); #else zn_array_mul (P, J, n2, G, n2, mod); #endif // ------------------------------------------------------------------------- // Step 3: extract output from P(X), and verify result if (res1) res1[0] = 1; // we will verify that \sum_{j=0}^{(p-3)/2} 4^j (2j+1) B(2j) = -2 mod p ulong check_accum1 = 1; ulong check_four_pow1 = 4 % p1; // g_sqr = g^2 // g_sqr_inv = g^(-2) ulong g_sqr1 = zn_mod_reduce (g1 * g1, mod1); ulong g_sqr_inv1 = zn_mod_reduce (g_inv1 * g_inv1, mod1); // make table with J[i] = (1 - g^(2i+2))(1 - g^(2i+4)) ... (1 - g^(p-3)) // for 0 <= i < (p-1)/2 ulong g_sqr_inv_pow1 = g_sqr_inv1; J[n1-1] = 1; for (i = 1; i < n1; i++) { J[n1-i-1] = zn_mod_reduce (J[n1-i] * (p1 + 1 - g_sqr_inv_pow1), mod1); g_sqr_inv_pow1 = zn_mod_reduce (g_sqr_inv_pow1 * g_sqr_inv1, mod1); } // fudge = g^(i^2) at each iteration fudge1 = g1; // g_sqr_pow = g^(2i) at each iteration ulong g_sqr_pow1 = g_sqr1; // prod_inv = [(1 - g^(2i))(1 - g^(2i+2)) ... (1 - g^(p-3))]^(-1) // at each iteration (todo: for i == 1, it's experimentally equal to -1/2 // mod p, need to prove this) ulong prod_inv1 = p1 - 2; for (i = 1; i < n1; i++) { ulong val = (i == (n1-1)) ? 0 : P[i + n1]; if (n1 & 1) val = zn_mod_neg (val, mod); val = zn_mod_add (val, G[i], mod); val = zn_mod_add (val, P[i], mod); // reduce it mod p1 val = zn_mod_reduce (val, mod1); // multiply by 4 * i * g^(i^2) val = zn_mod_reduce (val * fudge1, mod1); val = zn_mod_reduce (val * (2*i), mod1); val = zn_mod_add_slim (val, val, mod1); // divide by (1 - g^(2i)) val = zn_mod_reduce (val * prod_inv1, mod1); val = zn_mod_reduce (val * J[i], mod1); prod_inv1 = zn_mod_reduce (prod_inv1 * (1 + p1 - g_sqr_pow1), mod1); // store output coefficient if requested if (res1) res1[i] = val; // store irregular index if requested if (irregular1) { if (val == 0) { irregular1[0]++; if (irregular1[0] >= irregular1_max) { printf ("too many irregular indices for p = %lu\n", p1); abort (); } irregular1[irregular1[0]] = 2*i; } } // update fudge and g_sqr_pow g_sqr_pow1 = zn_mod_reduce (g_sqr_pow1 * g1, mod1); fudge1 = zn_mod_reduce (fudge1 * g_sqr_pow1, mod1); g_sqr_pow1 = zn_mod_reduce (g_sqr_pow1 * g1, mod1); // update verification data ulong check_term1 = zn_mod_reduce (check_four_pow1 * (2*i + 1), mod1); check_term1 = zn_mod_reduce (check_term1 * val, mod1); check_accum1 = zn_mod_add_slim (check_accum1, check_term1, mod1); check_four_pow1 = zn_mod_add_slim (check_four_pow1, check_four_pow1, mod1); check_four_pow1 = zn_mod_add_slim (check_four_pow1, check_four_pow1, mod1); } if (check_accum1 != p1-2) { printf ("bernoulli_dual failed correctness check for p1 = %lu\n", p1); abort (); } // ------------------------------------------------------------------------- // Do step 3 again for the second prime res2[0] = 1; // we will verify that \sum_{j=0}^{(p-3)/2} 4^j (2j+1) B(2j) = -2 mod p ulong check_accum2 = 1; ulong check_four_pow2 = 4 % p2; // g_sqr = g^2 // g_sqr_inv = g^(-2) ulong g_sqr2 = zn_mod_reduce (g2 * g2, mod2); ulong g_sqr_inv2 = zn_mod_reduce (g_inv2 * g_inv2, mod2); // make table with J[i] = (1 - g^(2i+2))(1 - g^(2i+4)) ... (1 - g^(p-3)) // for 0 <= i < (p-1)/2 ulong g_sqr_inv_pow2 = g_sqr_inv2; J[n2-1] = 1; for (i = 1; i < n2; i++) { J[n2-i-1] = zn_mod_reduce (J[n2-i] * (p2 + 1 - g_sqr_inv_pow2), mod2); g_sqr_inv_pow2 = zn_mod_reduce (g_sqr_inv_pow2 * g_sqr_inv2, mod2); } // fudge = g^(i^2) at each iteration fudge2 = g2; // g_sqr_pow = g^(2i) at each iteration ulong g_sqr_pow2 = g_sqr2; // prod_inv = [(1 - g^(2i))(1 - g^(2i+2)) ... (1 - g^(p-3))]^(-1) // at each iteration (todo: for i == 1, it's experimentally equal to -1/2 // mod p, need to prove this) ulong prod_inv2 = p2 - 2; for (i = 1; i < n2; i++) { ulong val = (i == (n2-1)) ? 0 : P[i + n2]; if (n2 & 1) val = zn_mod_neg (val, mod); val = zn_mod_add (val, G[i], mod); val = zn_mod_add (val, P[i], mod); // reduce it mod p2 val = zn_mod_reduce (val, mod2); // multiply by 4 * i * g^(i^2) val = zn_mod_reduce (val * fudge2, mod2); val = zn_mod_reduce (val * (2*i), mod2); val = zn_mod_add_slim (val, val, mod2); // divide by (1 - g^(2i)) val = zn_mod_reduce (val * prod_inv2, mod2); val = zn_mod_reduce (val * J[i], mod2); prod_inv2 = zn_mod_reduce (prod_inv2 * (1 + p2 - g_sqr_pow2), mod2); // store output coefficient if requested if (!own_res2) res2[i] = val; // store irregular index if requested if (irregular2) { if (val == 0) { irregular2[0]++; if (irregular2[0] >= irregular2_max) { printf ("too many irregular indices for p = %lu\n", p2); abort (); } irregular2[irregular2[0]] = 2*i; } } // update fudge and g_sqr_pow g_sqr_pow2 = zn_mod_reduce (g_sqr_pow2 * g2, mod2); fudge2 = zn_mod_reduce (fudge2 * g_sqr_pow2, mod2); g_sqr_pow2 = zn_mod_reduce (g_sqr_pow2 * g2, mod2); // update verification data ulong check_term2 = zn_mod_reduce (check_four_pow2 * (2*i + 1), mod2); check_term2 = zn_mod_reduce (check_term2 * val, mod2); check_accum2 = zn_mod_add_slim (check_accum2, check_term2, mod2); check_four_pow2 = zn_mod_add_slim (check_four_pow2, check_four_pow2, mod2); check_four_pow2 = zn_mod_add_slim (check_four_pow2, check_four_pow2, mod2); } if (check_accum2 != p2-2) { printf ("bernoulli_dual failed correctness check for p2 = %lu\n", p2); abort (); } if (own_res2) free (res2); free (P); free (G); zn_mod_clear (mod); zn_mod_clear (mod2); zn_mod_clear (mod1); } int main (int argc, char* argv[]) { if (argc == 2) { ulong i, p = atol (argv[1]); ulong irregular[30]; bernoulli (NULL, irregular, 29, p); printf ("irregular indices for p = %lu: ", p); for (i = 1; i <= irregular[0]; i++) printf ("%lu ", irregular[i]); printf ("\n"); } else if (argc == 3) { ulong i, p1 = atol (argv[1]), p2 = atol (argv[2]); ulong irregular1[30]; ulong irregular2[30]; bernoulli_dual (NULL, irregular1, 29, p1, NULL, irregular2, 29, p2); printf ("irregular indices for p = %lu: ", p1); for (i = 1; i <= irregular1[0]; i++) printf ("%lu ", irregular1[i]); printf ("\n"); printf ("irregular indices for p = %lu: ", p2); for (i = 1; i <= irregular2[0]; i++) printf ("%lu ", irregular2[i]); printf ("\n"); } else { printf ("usage:\n"); printf ("\n"); printf (" bernoulli p\n"); printf (" prints irregular indices for p\n"); printf ("\n"); printf (" bernoulli p1 p2\n"); printf (" prints irregular indices for p1 and p2\n"); } return 0; } // end of file **************************************************************** zn_poly-0.9.2/doc/000077500000000000000000000000001360464557000137735ustar00rootroot00000000000000zn_poly-0.9.2/doc/REFERENCES000066400000000000000000000033461360464557000153450ustar00rootroot00000000000000 =============================================================================== [Bai89]: Bailey, "FFTs in External or Hierarchical Memory", Journal of Supercomputing, vol. 4, no. 1 (Mar 1990), 23--35 [BLS03]: Bostan, Lecerf, Schost, "Tellegen's principle into practice", Symbolic and Algebraic Computation (2003), 37--44 (Proceedings of ISSAC 2003) [GM94]: Granlund, Montgomery, "Division by Invariant Integers using Multiplication", ACM SIG-PLAN Notices (1994), vol 29 no 6, 61--72 [Har07]: Harvey, "Faster polynomial multiplication via multipoint Kronecker substitution", preprint at http://arxiv.org/abs/0712.4046 (2007) [Har08]: Harvey, "A cache-friendly truncated FFT", preprint at http://arxiv.org/abs/0810.3203 (2008) [HQZ04]: Hanrot, Quercia, Zimmermann, "The Middle Product Algorithm, I. Speeding up the division and square root of power series", AAECC (2004), vol 14 no 6, 415--438 [HZ04]: Hanrot, Zimmermann, "Newton iteration revisited", 2004 [Mon85]: Montgomery, "Modular multiplication without trial division", Math. Comp. 44 (1985) no. 170, 519--521 [Nus80]: Nussbaumer, "Fast polynomial transform algorithms for digital convolution", IEEE Transactions on Acoustics, Speech, and Signal Processing 28 (1980), 205--215. [Sch77]: Schonhage, "Schnelle Multiplikation von Polynomen uber Korpern der Charakteristik 2", Acta Informatica, vol 7 (1977), 395--398 [vdH04]: van der Hoeven, "The Truncated Fourier Transform and Applications", ISSAC 2004 [vdH05]: van der Hoeven, "Notes on the Truncated Fourier Transform" =============================================================================== zn_poly-0.9.2/gpl-2.0.txt000066400000000000000000000431031360464557000150470ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. zn_poly-0.9.2/gpl-3.0.txt000066400000000000000000001045131360464557000150530ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . zn_poly-0.9.2/include/000077500000000000000000000000001360464557000146515ustar00rootroot00000000000000zn_poly-0.9.2/include/profiler.h000066400000000000000000000116251360464557000166510ustar00rootroot00000000000000/* profiler.h: header file for profiling routines Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . ============================================================================ NOTE: this file includes code adapted from GMP 4.2.1 and FLINT 1.0. Please see the file COPYING for further information about licensing of this code. */ #ifndef ZNP_PROFILER_H #define ZNP_PROFILER_H #include #include #include "zn_poly.h" #ifdef __cplusplus extern "C" { #endif /* Estimates the number of "cycles" per second (i.e. as reported by get_cycle_counter(), which might not actually be "cycles", but anyway). This is not supposed to be an accurate measurement; it's just a ballpark estimate, for the purpose of calibrating profiling/tuning runs to take a reasonable amount of time (w.r.t. typical human patience). */ double estimate_cycle_scale_factor (); /* Global scaling factor, set at startup via calibrate_cycle_scale_factor(). */ extern double cycle_scale_factor; /* Sets cycle_scale_factor according to estimate_cycle_scale_factor(), and prints some logging information to stderr. */ void calibrate_cycle_scale_factor (); /* ZNP_HAVE_CYCLE_COUNTER flag is set if a cycle counter is available on this platform. If not, we can't do any profiling or tuning. ZNP_PROFILE_CLOCK_MULTIPLIER is a multiplier used to decide how long to let profiling/tuning run for, in terms of the cycle count. This is a pretty rough estimate at the moment. */ #if defined(__GNUC__) #if defined(__i386__) || defined(__x86_64__) #define ZNP_HAVE_CYCLE_COUNTER 1 #define ZNP_PROFILE_CLOCK_MULTIPLIER 100 typedef unsigned long long cycle_count_t; typedef unsigned long long cycle_diff_t; ZNP_INLINE cycle_count_t get_cycle_counter () { // hmmm... we're assuming "unsigned" is always 32 bits? unsigned hi; unsigned lo; __asm __volatile__ ( "\t" "rdtsc \n\t" "movl %%edx, %0 \n\t" "movl %%eax, %1 \n\t" : "=r" (hi), "=r" (lo) : : "%edx", "%eax"); return (((unsigned long long)(hi)) << 32) + ((unsigned long long) lo); } #endif #if defined (_ARCH_PPC) || defined (__powerpc__) || defined (__POWERPC__) \ || defined (__ppc__) || defined(__ppc64__) #define ZNP_HAVE_CYCLE_COUNTER 1 #define ZNP_PROFILE_CLOCK_MULTIPLIER 1 typedef unsigned long long cycle_count_t; typedef unsigned long long cycle_diff_t; ZNP_INLINE cycle_count_t get_cycle_counter () { // hmmm... we're assuming "unsigned" is always 32 bits? unsigned hi1, hi2; unsigned lo; do { __asm __volatile__ ( "\t" "mftbu %0 \n\t" "mftb %1 \n\t" "mftbu %2 \n\t" : "=r" (hi1), "=r" (lo), "=r" (hi2)); } while (hi1 != hi2); return (((unsigned long long)(hi2)) << 32) + ((unsigned long long) lo); } #endif #endif #ifndef ZNP_HAVE_CYCLE_COUNTER #define ZNP_HAVE_CYCLE_COUNTER 0 #define ZNP_PROFILE_CLOCK_MULTIPLIER 0 typedef int cycle_count_t; typedef int cycle_diff_t; ZNP_INLINE cycle_count_t get_cycle_counter () { printf ("called get_cycle_counter() without a cycle counter!\n"); abort (); return 0; } #endif ZNP_INLINE cycle_diff_t cycle_diff (cycle_count_t start, cycle_count_t stop) { return stop - start; } /* Repeatedly runs target with parameter arg, with increasing values of count (count not allowed to get bigger than 10). Expects target to be a function that runs a certain profile count times and returns a total cycle count. Collects statistics regarding return values of target (i.e. cycle counts), in particular the median, upper and lower quartiles. Runs for at most limit seconds (as measured by the target!). Calls target at least once, and at most 50 times. If spread != NULL, it stores there the ratio (interquartile range) / (median). If samples != NULL, it stores there the number of times target was called. */ double profile (double* spread, unsigned* samples, double (*target)(void* arg, unsigned long count), void* arg, double limit); #ifdef __cplusplus } #endif #endif // end of file **************************************************************** zn_poly-0.9.2/include/support.h000066400000000000000000000211441360464557000165400ustar00rootroot00000000000000/* support.h: various support routines for test, profiling and tuning code Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #ifndef ZNP_SUPPORT_H #define ZNP_SUPPORT_H #include #include #include "zn_poly_internal.h" #ifdef __cplusplus extern "C" { #endif /* single global random state for test/profile modules */ extern gmp_randstate_t randstate; /* An array of modulus bitsizes, used by several test functions. */ extern unsigned test_bitsizes[]; extern unsigned num_test_bitsizes; // how big the array is /* Exports abs(op) to res, storing exactly n limbs (zero-padded if necessary). Sign of op is ignored. abs(op) must fit into n limbs. */ void mpz_to_mpn (mp_limb_t* res, size_t n, const mpz_t op); /* Converts mpn buffer (exactly n limbs) to mpz. Output is always non-negative. */ void mpn_to_mpz (mpz_t res, const mp_limb_t* op, size_t n); /* Returns random unsigned long in [0, max). */ ulong random_ulong (ulong max); /* Returns random unsigned long in [0, 2^b). */ ulong random_ulong_bits (unsigned b); /* Returns random modulus with exactly b bits, i.e. in [2^(b-1), 2^b). If require_odd is set, the returned modulus will be odd. */ ulong random_modulus (unsigned b, int require_odd); /* Prints array to stdout, in format e.g. "[2 3 7]". */ void zn_array_print (const ulong* x, size_t n); /* Similar to mpn_random2, but flips all the bits with probability 1/2. This deals with the annoying way that mpn_random2 always writes 1's to the high bits of the buffer. */ void ZNP_mpn_random2 (mp_limb_t* res, size_t n); void ref_zn_array_mul (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, const zn_mod_t mod); void ref_zn_array_scalar_mul (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod); void ref_zn_array_mulmid (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, const zn_mod_t mod); void ref_zn_array_negamul (ulong* res, const ulong* op1, const ulong* op2, size_t n, const zn_mod_t mod); void ref_mpn_mulmid (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2); void ref_mpn_smp (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2); #if DEBUG /* Sets res to a uniformly random pmf. Bias is uniformly random in [0, 2M). */ void pmf_rand (pmf_t res, ulong M, const zn_mod_t mod); /* Compares op1 and op2, returns 0 if equal. */ int pmf_cmp (const pmf_t op1, const pmf_t op2, ulong M, const zn_mod_t mod); /* Prints op to standard output (in normalised form). */ void pmf_print (const pmf_t op, ulong M, const zn_mod_t mod); /* Prints op to standard output. */ void pmfvec_print (const pmfvec_t op); /* Prints first n coefficients of op to standard output. */ void pmfvec_print_trunc (const pmfvec_t op, ulong n); #endif /* ============================================================================ tuning routines ============================================================================ */ #define tune_mul_KS \ ZNP_tune_mul_KS void tune_mul_KS (FILE* flog, int sqr, int verbose); #define tune_mulmid_KS \ ZNP_tune_mulmid_KS void tune_mulmid_KS (FILE* flog, int verbose); #define tune_mul_nuss \ ZNP_tune_mul_nuss void tune_nuss (FILE* flog, int sqr, int verbose); #define tune_mul \ ZNP_tune_mul void tune_mul (FILE* flog, int sqr, int verbose); #define tune_mulmid \ ZNP_tune_mulmid void tune_mulmid (FILE* flog, int verbose); #define tune_mpn_smp_kara \ ZNP_tune_mpn_smp_kara void tune_mpn_smp_kara (FILE* flog, int verbose); #define tune_mpn_mulmid_fallback \ ZNP_tune_mpn_mulmid_fallback void tune_mpn_mulmid_fallback (FILE* flog, int verbose); /* ============================================================================ structs used in profiling routines ============================================================================ */ /* Struct for passing parameters to various profiling routines. Not all members are used by all routines, and they may have different meanings for different routines. */ typedef struct { // modulus ulong m; // length of input polynomials size_t n; // lengths of input polynomials for routines taking two input lengths size_t n1, n2; // for negacyclic multiplication, log2 of the convolution length unsigned lgL; // which algorithm to use. Meaning depends on routine selected. int algo; // for routines profiling multiplication, indicates whether we want to // profile squaring int sqr; } profile_info_struct; typedef profile_info_struct profile_info_t[1]; /* legal algo values for profile_mul */ enum { ALGO_MUL_BEST, ALGO_MUL_KS1, ALGO_MUL_KS1_REDC, ALGO_MUL_KS2, ALGO_MUL_KS2_REDC, ALGO_MUL_KS3, ALGO_MUL_KS3_REDC, ALGO_MUL_KS4, ALGO_MUL_KS4_REDC, ALGO_MUL_FFT, ALGO_MUL_NTL, }; /* Profiles one of the multiplication routines. arg points to a profile_info_t with parameters m, n1, n2, sqr, algo. Returns total cycle count for count calls. */ double profile_mul (void* arg, unsigned long count); /* As above, but assumes that the algorithm is ALGO_MUL_NTL. */ double profile_mul_ntl (void* arg, unsigned long count); /* legal algo values for profile_negamul */ enum { // fall back on calling zn_array_mul and reducing negacyclically ALGO_NEGAMUL_FALLBACK, // use Nussbaumer convolution ALGO_NEGAMUL_NUSS, }; /* Profiles one of the negacyclic multiplication routines. arg points to a profile_info_t with parameters m, lgL, sqr, algo. Returns total cycle count for count calls. */ double profile_negamul (void* arg, unsigned long count); /* legal algo values for profile_mulmid */ enum { ALGO_MULMID_BEST, ALGO_MULMID_FALLBACK, ALGO_MULMID_KS1, ALGO_MULMID_KS1_REDC, ALGO_MULMID_KS2, ALGO_MULMID_KS2_REDC, ALGO_MULMID_KS3, ALGO_MULMID_KS3_REDC, ALGO_MULMID_KS4, ALGO_MULMID_KS4_REDC, ALGO_MULMID_FFT, }; /* Profiles one of the middle product routines. arg points to a profile_info_t with parameters m, n1, n2, algo. Returns total cycle count for count calls. */ double profile_mulmid (void* arg, unsigned long count); /* legal algo values for profile_invert */ enum { ALGO_INVERT_BEST, ALGO_INVERT_NTL, }; /* Profiles one of the series inversion routines. arg points to a profile_info_t with parameters m, n, algo. Returns total cycle count for count calls. */ double profile_invert (void* arg, unsigned long count); /* As above, but assumes that the algorithm is ALGO_INVERT_NTL. */ double profile_invert_ntl (void* arg, unsigned long count); /* Profiles mpn_mul. arg points to a profile_info_t with parameters n1, n2. Returns total cycle count for count calls. */ double profile_mpn_mul (void* arg, unsigned long count); /* Profiles mpn_smp. arg points to a profile_info_t with parameters n1, n2. Returns total cycle count for count calls. */ double profile_mpn_smp (void* arg, unsigned long count); /* As above, for mpn_mulmid_fallback. */ double profile_mpn_mulmid_fallback (void* arg, unsigned long count); /* As above, for mpn_smp_basecase. */ double profile_mpn_smp_basecase (void* arg, unsigned long count); /* As above, for mpn_smp_kara, except that the n parameter is used instead of n1, n2. */ double profile_mpn_smp_kara (void* arg, unsigned long count); double profile_bfly (void* arg, unsigned long count); double profile_mpn_aors (void* arg, unsigned long count); double profile_scalar_mul (void* arg, unsigned long count); void prof_main (int argc, char* argv[]); #ifdef __cplusplus } #endif #endif // end of file **************************************************************** zn_poly-0.9.2/include/wide_arith.h000066400000000000000000000365261360464557000171550ustar00rootroot00000000000000/* wide_arith.h: double-word arithmetic Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . ============================================================================ NOTE: this file includes code adapted from GMP 4.2.1 and NTL 5.4.1. Please see the file COPYING for further information about licensing of this code. */ /* This file defines macros: ZNP_MUL_HI(hi, a, b) computes the high word of a * b, where a and b are unsigned longs. ZNP_MUL_WIDE(hi, lo, a, b) computes the high and low words of a * b, where a and b are unsigned longs. ZNP_ADD_WIDE(s1, s0, a1, a0, b1, b0) computes B*s1 + s0 = (B*a1 + a0) + (B*b1 + b0), where all the variables are unsigned longs. Carry is discarded. ZNP_SUB_WIDE(s1, s0, a1, a0, b1, b0) computes B*s1 + s0 = (B*a1 + a0) - (B*b1 + b0), where all the variables are unsigned longs. Borrow is discarded. If using gcc, there are several inline assembly implementations. */ #ifndef ZNP_WIDE_ARITH_H #define ZNP_WIDE_ARITH_H #include #if ULONG_MAX == 4294967295U #define UNSIGNED_LONG_BITS 32 #elif ULONG_MAX == 18446744073709551615U #define UNSIGNED_LONG_BITS 64 #else #error wide_arith requires that unsigned long is either 32 bits or 64 bits #endif #if (defined (__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 7))) // To simplify things, we require gcc v2.7 or higher. // ------ POWERPC ------ #if defined (_ARCH_PPC) || defined (__powerpc__) || defined (__POWERPC__) \ || defined (__ppc__) || defined (__ppc64__) \ || (defined (PPC) && ! defined (CPU_FAMILY)) /* gcc 2.7.x GNU&SysV */ \ || (defined (PPC) && defined (CPU_FAMILY) /* VxWorks */ \ && CPU_FAMILY == PPC) #if (UNSIGNED_LONG_BITS == 32) #define ZNP_MUL_HI(hi, a, b) \ __asm__ ("mulhwu %0,%1,%2" : "=r" (hi) : "%r" (a), "r" (b)); #define ZNP_ADD_WIDE(s1, s0, a1, a0, b1, b0) \ do { \ if (__builtin_constant_p (b1) && (b1) == 0) \ __asm__ ("{a%I4|add%I4c} %1,%3,%4\n\t{aze|addze} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (a1), "%r" (a0), "rI" (b0)); \ else if (__builtin_constant_p (b1) && (b1) == ~(unsigned long) 0) \ __asm__ ("{a%I4|add%I4c} %1,%3,%4\n\t{ame|addme} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (a1), "%r" (a0), "rI" (b0)); \ else \ __asm__ ("{a%I5|add%I5c} %1,%4,%5\n\t{ae|adde} %0,%2,%3" \ : "=r" (s1), "=&r" (s0) \ : "r" (a1), "r" (b1), "%r" (a0), "rI" (b0)); \ } while (0) #define ZNP_SUB_WIDE(s1, s0, a1, a0, b1, b0) \ do { \ if (__builtin_constant_p (a1) && (a1) == 0) \ __asm__ ("{sf%I3|subf%I3c} %1,%4,%3\n\t{sfze|subfze} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (b1), "rI" (a0), "r" (b0)); \ else if (__builtin_constant_p (a1) && (a1) == ~(unsigned long) 0) \ __asm__ ("{sf%I3|subf%I3c} %1,%4,%3\n\t{sfme|subfme} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (b1), "rI" (a0), "r" (b0)); \ else if (__builtin_constant_p (b1) && (b1) == 0) \ __asm__ ("{sf%I3|subf%I3c} %1,%4,%3\n\t{ame|addme} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (a1), "rI" (a0), "r" (b0)); \ else if (__builtin_constant_p (b1) && (b1) == ~(unsigned long) 0) \ __asm__ ("{sf%I3|subf%I3c} %1,%4,%3\n\t{aze|addze} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (a1), "rI" (a0), "r" (b0)); \ else \ __asm__ ("{sf%I4|subf%I4c} %1,%5,%4\n\t{sfe|subfe} %0,%3,%2" \ : "=r" (s1), "=&r" (s0) \ : "r" (a1), "r" (b1), "rI" (a0), "r" (b0)); \ } while (0) #elif (UNSIGNED_LONG_BITS == 64) #define ZNP_MUL_HI(hi, a, b) \ __asm__ ("mulhdu %0,%1,%2" : "=r" (hi) : "%r" (a), "r" (b)); #define ZNP_ADD_WIDE(s1, s0, a1, a0, b1, b0) \ do { \ if (__builtin_constant_p (b1) && (b1) == 0) \ __asm__ ("{a%I4|add%I4c} %1,%3,%4\n\t{aze|addze} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (a1), "%r" (a0), "rI" (b0)); \ else if (__builtin_constant_p (b1) && (b1) == ~(unsigned long) 0) \ __asm__ ("{a%I4|add%I4c} %1,%3,%4\n\t{ame|addme} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (a1), "%r" (a0), "rI" (b0)); \ else \ __asm__ ("{a%I5|add%I5c} %1,%4,%5\n\t{ae|adde} %0,%2,%3" \ : "=r" (s1), "=&r" (s0) \ : "r" (a1), "r" (b1), "%r" (a0), "rI" (b0)); \ } while (0) #define ZNP_SUB_WIDE(s1, s0, a1, a0, b1, b0) \ do { \ if (__builtin_constant_p (a1) && (a1) == 0) \ __asm__ ("{sf%I3|subf%I3c} %1,%4,%3\n\t{sfze|subfze} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (b1), "rI" (a0), "r" (b0)); \ else if (__builtin_constant_p (a1) && (a1) == ~(unsigned long) 0) \ __asm__ ("{sf%I3|subf%I3c} %1,%4,%3\n\t{sfme|subfme} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (b1), "rI" (a0), "r" (b0)); \ else if (__builtin_constant_p (b1) && (b1) == 0) \ __asm__ ("{sf%I3|subf%I3c} %1,%4,%3\n\t{ame|addme} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (a1), "rI" (a0), "r" (b0)); \ else if (__builtin_constant_p (b1) && (b1) == ~(unsigned long) 0) \ __asm__ ("{sf%I3|subf%I3c} %1,%4,%3\n\t{aze|addze} %0,%2" \ : "=r" (s1), "=&r" (s0) : "r" (a1), "rI" (a0), "r" (b0)); \ else \ __asm__ ("{sf%I4|subf%I4c} %1,%5,%4\n\t{sfe|subfe} %0,%3,%2" \ : "=r" (s1), "=&r" (s0) \ : "r" (a1), "r" (b1), "rI" (a0), "r" (b0)); \ } while (0) #endif #endif // ------ ALPHA ------ #if (defined (__alpha) && UNSIGNED_LONG_BITS == 64) #define ZNP_MUL_HI(hi, a, b) \ __asm__ ("umulh %r1,%2,%0" : "=r" (hi) : "%rJ" (a), "rI" (b)); #endif // ------ IA64 ------ #if (defined (__ia64) && UNSIGNED_LONG_BITS == 64) #define ZNP_MUL_HI(hi, a, b) \ __asm__ ("xma.hu %0 = %1, %2, f0" : "=f" (hi) : "f" (a), "f" (b)); #endif // ------ x86 ------ #if ((defined (__i386__) || defined (__i486__)) && UNSIGNED_LONG_BITS == 32) #define ZNP_MUL_WIDE(hi, lo, a, b) \ __asm__ ("mull %3" : "=a" (lo), "=d" (hi) : "%0" (a), "rm" (b)); #define ZNP_ADD_WIDE(s1, s0, a1, a0, b1, b0) \ __asm__ ("addl %5,%k1\n\tadcl %3,%k0" \ : "=r" (s1), "=&r" (s0) \ : "0" ((unsigned long)(a1)), "g" ((unsigned long)(b1)), \ "%1" ((unsigned long)(a0)), "g" ((unsigned long)(b0))) #define ZNP_SUB_WIDE(s1, s0, a1, a0, b1, b0) \ __asm__ ("subl %5,%k1\n\tsbbl %3,%k0" \ : "=r" (s1), "=&r" (s0) \ : "0" ((unsigned long)(a1)), "g" ((unsigned long)(b1)), \ "1" ((unsigned long)(a0)), "g" ((unsigned long)(b0))) #endif // ------ x86-64 ------ #if (defined (__x86_64__) && UNSIGNED_LONG_BITS == 64) #define ZNP_MUL_WIDE(hi, lo, a, b) \ __asm__ ("mulq %3" : "=a" (lo), "=d" (hi) : "%0" (a), "rm" (b)); #define ZNP_ADD_WIDE(s1, s0, a1, a0, b1, b0) \ __asm__ ("addq %5,%q1\n\tadcq %3,%q0" \ : "=r" (s1), "=&r" (s0) \ : "0" ((unsigned long)(a1)), "rme" ((unsigned long)(b1)), \ "%1" ((unsigned long)(a0)), "rme" ((unsigned long)(b0))) #define ZNP_SUB_WIDE(s1, s0, a1, a0, b1, b0) \ __asm__ ("subq %5,%q1\n\tsbbq %3,%q0" \ : "=r" (s1), "=&r" (s0) \ : "0" ((unsigned long)(a1)), "rme" ((unsigned long)(b1)), \ "1" ((unsigned long)(a0)), "rme" ((unsigned long)(b0))) #endif // ------ MIPS ------ #if (defined (__mips)) #if (UNSIGNED_LONG_BITS == 32) #define ZNP_MUL_WIDE(hi, lo, a, b) \ __asm__ ("multu %2,%3" : "=l" (lo), "=h" (hi) : "d" (a), "d" (b)); #elif (UNSIGNED_LONG_BITS == 64) #define ZNP_MUL_WIDE(hi, lo, a, b) \ __asm__ ("dmultu %2,%3" : "=l" (lo), "=h" (hi) : "d" (a), "d" (b)); #endif #endif // -------- SPARC -------- #if (defined (__sparc__) && UNSIGNED_LONG_BITS == 32) #if (defined (__sparc_v9__) || defined (__sparcv9) || \ defined (__sparc_v8__) || defined (__sparcv8) || defined (__sparclite__)) #define ZNP_MUL_WIDE(hi, lo, a, b) \ __asm__ ("umul %2,%3,%1;rd %%y,%0" \ : "=r" (hi), "=r" (lo) : "r" (a), "r" (b)); #endif #endif #endif // __GNUC__ // -------- generic implementations -------- /* If neither ZNP_MUL_HI nor ZNP_MUL_WIDE has an assembly implementation, do it in C. (algorithm is from GMP's longlong.h) */ #if !(defined (ZNP_MUL_HI) || defined (ZNP_MUL_WIDE)) #warning No assembly implementation of wide multiplication available for this \ machine; using generic C code instead. #define ZNP_MUL_WIDE(hi, lo, a, b) \ do { \ unsigned long __a = (a); \ unsigned long __b = (b); \ unsigned long __mask = (1UL << (UNSIGNED_LONG_BITS/2)) - 1; \ \ unsigned long __a0 = __a & __mask; \ unsigned long __a1 = __a >> (UNSIGNED_LONG_BITS/2); \ unsigned long __b0 = __b & __mask; \ unsigned long __b1 = __b >> (UNSIGNED_LONG_BITS/2); \ \ unsigned long __p00 = __a0 * __b0; \ unsigned long __p01 = __a0 * __b1; \ unsigned long __p10 = __a1 * __b0; \ unsigned long __p11 = __a1 * __b1; \ \ __p01 += (__p00 >> (UNSIGNED_LONG_BITS/2)); /* no possible carry */ \ __p01 += __p10; \ if (__p01 < __p10) \ __p11 += (1UL << (UNSIGNED_LONG_BITS/2)); /* propagate carry */ \ \ (hi) = __p11 + (__p01 >> (UNSIGNED_LONG_BITS/2)); \ (lo) = (__p00 & __mask) + (__p01 << (UNSIGNED_LONG_BITS/2)); \ } while (0) #endif /* If only one of ZNP_MUL_HI and ZNP_MUL_WIDE is defined, define the other one in terms of that one. */ #if defined (ZNP_MUL_HI) && !defined (ZNP_MUL_WIDE) #define ZNP_MUL_WIDE(hi, lo, a, b) \ do { \ unsigned long __a = (a), __b = (b); \ ZNP_MUL_HI ((hi), __a, __b); \ (lo) = __a * __b; \ } while (0) #endif #if defined (ZNP_MUL_WIDE) && !defined (ZNP_MUL_HI) #define ZNP_MUL_HI(hi, a, b) \ do { \ unsigned long __dummy; \ ZNP_MUL_WIDE ((hi), __dummy, (a), (b)); \ } while (0) #endif #if !defined (ZNP_ADD_WIDE) #define ZNP_ADD_WIDE(s1, s0, a1, a0, b1, b0) \ do { \ unsigned long __a0 = (a0); \ unsigned long __temp = __a0 + (b0); \ (s1) = (a1) + (b1) + (__temp < __a0); \ (s0) = __temp; \ } while (0) #endif #if !defined (ZNP_SUB_WIDE) #define ZNP_SUB_WIDE(s1, s0, a1, a0, b1, b0) \ do { \ unsigned long __a0 = (a0); \ unsigned long __b0 = (b0); \ unsigned long __temp = __a0 - __b0; \ (s1) = (a1) - (b1) - (__a0 < __b0); \ (s0) = __temp; \ } while (0) #endif #endif // end of file **************************************************************** zn_poly-0.9.2/include/zn_poly.h000066400000000000000000000377611360464557000165320ustar00rootroot00000000000000/* zn_poly.h: main header file to be #included by zn_poly users Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #ifndef ZN_POLY_H #define ZN_POLY_H #ifdef __cplusplus extern "C" { #endif #include #include #include #include #define ZNP_ASSERT assert #define ZNP_INLINE static inline /* Returns a string like "3.1" */ extern const char* zn_poly_version_string (); /* Three components of "version x.y.z" */ #define ZNP_VERSION_MAJOR 0 #define ZNP_VERSION_MINOR 9 #define ZNP_VERSION_REVISION 0 /* ULONG_BITS = number of bits per unsigned long */ #if ULONG_MAX == 4294967295UL #define ULONG_BITS 32 #elif ULONG_MAX == 18446744073709551615UL #define ULONG_BITS 64 #else #error zn_poly requires that unsigned long is either 32 bits or 64 bits #endif /* I get really sick of typing unsigned long. */ typedef unsigned long ulong; #include "wide_arith.h" /* ============================================================================ zn_mod_t stuff ============================================================================ */ /* zn_mod_t stores precomputed information about a modulus. The modulus can be any integer in the range 2 <= m < 2^ULONG_BITS. A modulus m is called "slim" if m < 2^(ULONG_BITS - 1), i.e. the residues never occupy the top bit of the word. Many routines are much faster for slim moduli. */ typedef struct { // the modulus, must be >= 2 ulong m; // ceil(log2(m)) = number of bits in a non-negative residue int bits; // reduction of B and B^2 mod m (where B = 2^ULONG_BITS) ulong B, B2; // sh1 and inv1 are respectively ell-1 and m' from Figure 4.1 of [GM94] unsigned sh1; ulong inv1; // sh2, sh3, inv2 and m_norm are respectively N-ell, ell-1, m', d_norm // from Figure 8.1 of [GM94] unsigned sh2, sh3; ulong inv2, m_norm; // inv3 = n^(-1) mod B (only valid if m is odd) ulong inv3; } zn_mod_struct; typedef zn_mod_struct zn_mod_t[1]; /* Initialises zn_mod_t with given modulus, performs some (fairly cheap) precomputations. */ void zn_mod_init (zn_mod_t mod, ulong m); /* Must be called when the modulus object goes out of scope. */ void zn_mod_clear (zn_mod_t mod); /* Returns the modulus. */ ZNP_INLINE ulong zn_mod_get (const zn_mod_t mod) { return mod->m; } /* Return nonzero if mod is a slim modulus. */ ZNP_INLINE int zn_mod_is_slim (const zn_mod_t mod) { return (long) mod->m >= 0; } /* Returns x + y mod m. Both x and y must be in [0, m). */ ZNP_INLINE ulong zn_mod_add (ulong x, ulong y, const zn_mod_t mod) { ZNP_ASSERT (x < mod->m && y < mod->m); ulong temp = mod->m - y; if (x < temp) return x + y; else return x - temp; } /* Same as zn_mod_add, but only for slim moduli. This is usually several times faster than zn_mod_add, depending on the context; see the array-profile target for examples. */ ZNP_INLINE ulong zn_mod_add_slim (ulong x, ulong y, const zn_mod_t mod) { ZNP_ASSERT (zn_mod_is_slim (mod)); ZNP_ASSERT (x < mod->m && y < mod->m); ulong temp = x + y; if (temp >= mod->m) temp -= mod->m; return temp; } /* Returns x - y mod m. Both x and y must be in [0, m). */ ZNP_INLINE ulong zn_mod_sub (ulong x, ulong y, const zn_mod_t mod) { ZNP_ASSERT (x < mod->m && y < mod->m); ulong temp = x - y; if (x < y) temp += mod->m; return temp; } /* Same as zn_mod_sub, but only for slim moduli. */ ZNP_INLINE ulong zn_mod_sub_slim (ulong x, ulong y, const zn_mod_t mod) { ZNP_ASSERT (zn_mod_is_slim (mod)); ZNP_ASSERT (x < mod->m && y < mod->m); long temp = x - y; temp += (temp < 0) ? mod->m : 0; return temp; } /* Returns -x mod m. x must be in [0, m). */ ZNP_INLINE ulong zn_mod_neg (ulong x, const zn_mod_t mod) { ZNP_ASSERT (x < mod->m); return x ? (mod->m - x) : x; } /* Return x/2 mod m. x must be in [0, m). If the modulus is even, x must be even too. */ ZNP_INLINE ulong zn_mod_divby2 (ulong x, const zn_mod_t mod) { ZNP_ASSERT (x < mod->m); ZNP_ASSERT ((mod->m & 1) || !(x & 1)); ulong mask = -(x & 1); ulong half = (mod->m >> 1) + 1; // = 1/2 mod m if m is odd return (x >> 1) + (mask & half); } /* Returns floor(x / m). No restrictions on x. Algorithm is essentially Figure 4.1 of [GM94]. */ ZNP_INLINE ulong zn_mod_quotient (ulong x, const zn_mod_t mod) { ulong t; ZNP_MUL_HI (t, x, mod->inv1); return (t + ((x - t) >> 1)) >> mod->sh1; } /* Returns x mod m. No restrictions on x. */ ZNP_INLINE ulong zn_mod_reduce (ulong x, const zn_mod_t mod) { return x - mod->m * zn_mod_quotient (x, mod); } /* Returns -x/B mod m. m must be odd. No restrictions on x. */ ZNP_INLINE ulong zn_mod_reduce_redc (ulong x, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ulong y = x * mod->inv3; ulong z; ZNP_MUL_HI (z, y, mod->m); return z; } /* Returns x1*B + x0 mod m. Assumes x1 is already in [0, m). Algorithm is essentially Figure 8.1 of [GM94]. */ ZNP_INLINE ulong zn_mod_reduce_wide (ulong x1, ulong x0, const zn_mod_t mod) { ZNP_ASSERT (x1 < mod->m); ulong y1 = (x1 << mod->sh2) + ((x0 >> 1) >> mod->sh3); ulong y0 = (x0 << mod->sh2); ulong sign = y0 >> (ULONG_BITS - 1); ulong z0 = y0 + (mod->m_norm & -sign); ulong a1, a0; ZNP_MUL_WIDE (a1, a0, mod->inv2, y1 + sign); ZNP_ADD_WIDE (a1, a0, a1, a0, y1, z0); ulong b1, b0; ZNP_MUL_WIDE (b1, b0, (-a1 - 1), mod->m); ZNP_ADD_WIDE (b1, b0, b1, b0, x1, x0); b1 -= mod->m; return b0 + (b1 & mod->m); } /* Returns -(x1*B + x0)/B mod m. Assumes x1 is already in [0, m), and that m is odd. Uses essentially Montgomery's REDC algorithm [Mon85]. */ ZNP_INLINE ulong zn_mod_reduce_wide_redc (ulong x1, ulong x0, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (x1 < mod->m); ulong y = x0 * mod->inv3; ulong z; ZNP_MUL_HI (z, y, mod->m); return zn_mod_sub (z, x1, mod); } /* Returns -(x1*B + x0)/B mod m. Assumes x1 is already in [0, m), and that m is odd, and that m is slim. Uses essentially Montgomery's REDC algorithm [Mon85]. */ ZNP_INLINE ulong zn_mod_reduce_wide_redc_slim (ulong x1, ulong x0, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (x1 < mod->m); ulong y = x0 * mod->inv3; ulong z; ZNP_MUL_HI (z, y, mod->m); return zn_mod_sub_slim (z, x1, mod); } /* Returns x1*B + x0 mod m. No restrictions on x0 and x1. */ ZNP_INLINE ulong zn_mod_reduce2 (ulong x1, ulong x0, const zn_mod_t mod) { // first reduce into [0, Bm) ulong c0, c1; ZNP_MUL_WIDE (c1, c0, x1, mod->B); ZNP_ADD_WIDE (c1, c0, c1, c0, 0, x0); // (must still have c1 < m) return zn_mod_reduce_wide (c1, c0, mod); } /* Returns -(x1*B + x0)/B mod m. m must be odd. No restrictions on x0 and x1. */ ZNP_INLINE ulong zn_mod_reduce2_redc (ulong x1, ulong x0, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); // first reduce into [0, Bm) ulong c0, c1; ZNP_MUL_WIDE (c1, c0, x1, mod->B); ZNP_ADD_WIDE (c1, c0, c1, c0, 0, x0); // (must still have c1 < m) return zn_mod_reduce_wide_redc (c1, c0, mod); } /* Returns x2*B^2 + x1*B + x0 mod m. No restrictions on x0, x1 or x2. */ ZNP_INLINE ulong zn_mod_reduce3 (ulong x2, ulong x1, ulong x0, const zn_mod_t mod) { // reduce B^2*x2 and B*x1 into [0, Bm) ulong c0, c1, d0, d1; ZNP_MUL_WIDE (c1, c0, x2, mod->B2); ZNP_MUL_WIDE (d1, d0, x1, mod->B); // add B^2*x2 and B*x1 and x0 mod Bm ZNP_ADD_WIDE (c1, c0, c1, c0, 0, d0); // (must still have c1 < m) ZNP_ADD_WIDE (c1, c0, c1, c0, 0, x0); if (c1 >= mod->m) c1 -= mod->m; c1 = zn_mod_add (c1, d1, mod); // finally reduce it mod m return zn_mod_reduce_wide (c1, c0, mod); } /* Returns -(x2*B^2 + x1*B + x0)/B mod m. m must be odd. No restrictions on x0, x1 or x2. */ ZNP_INLINE ulong zn_mod_reduce3_redc (ulong x2, ulong x1, ulong x0, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); // reduce B^2*x2 and B*x1 into [0, Bm) ulong c0, c1, d0, d1; ZNP_MUL_WIDE (c1, c0, x2, mod->B2); ZNP_MUL_WIDE (d1, d0, x1, mod->B); // add B^2*x2 and B*x1 and x0 mod Bm ZNP_ADD_WIDE (c1, c0, c1, c0, 0, d0); // (must still have c1 < m) ZNP_ADD_WIDE (c1, c0, c1, c0, 0, x0); if (c1 >= mod->m) c1 -= mod->m; c1 = zn_mod_add (c1, d1, mod); // finally reduce it mod m return zn_mod_reduce_wide_redc (c1, c0, mod); } /* Returns x * y mod m. x and y must be in [0, m). */ ZNP_INLINE ulong zn_mod_mul (ulong x, ulong y, const zn_mod_t mod) { ZNP_ASSERT (x < mod->m && y < mod->m); ulong hi, lo; ZNP_MUL_WIDE (hi, lo, x, y); return zn_mod_reduce_wide (hi, lo, mod); } /* Returns -(x * y)/B mod m. x and y must be in [0, m), and m must be odd. */ ZNP_INLINE ulong zn_mod_mul_redc (ulong x, ulong y, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (x < mod->m && y < mod->m); ulong hi, lo; ZNP_MUL_WIDE (hi, lo, x, y); return zn_mod_reduce_wide_redc (hi, lo, mod); } /* Returns -(x * y)/B mod m. x and y must be in [0, m), and m must be odd and slim. */ ZNP_INLINE ulong zn_mod_mul_redc_slim (ulong x, ulong y, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (zn_mod_is_slim (mod)); ZNP_ASSERT (x < mod->m && y < mod->m); ulong hi, lo; ZNP_MUL_WIDE (hi, lo, x, y); return zn_mod_reduce_wide_redc_slim (hi, lo, mod); } /* Returns x^k mod m. x must be in [0, m). Negative indices are not supported (yet). */ ulong zn_mod_pow (ulong x, long k, const zn_mod_t mod); /* Returns 1/x mod n, or 0 if x is not invertible mod m. x must be in [0, m). */ ulong zn_mod_invert (ulong x, const zn_mod_t mod); /* ============================================================================ scalar multiplication on raw arrays ============================================================================ */ /* Multiplies each element of op[0, n) by x, stores result at res[0, n). res and op must be either identical or disjoint buffers. */ void zn_array_scalar_mul (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod); /* ============================================================================ polynomial multiplication on raw arrays ============================================================================ */ /* Multiplies op1[0, n1) by op2[0, n2), stores result in res[0, n1 + n2 - 1). op1 and op2 may alias each other, but neither may overlap res. Must have n1 >= n2 >= 1. Automatically selects best multiplication algorithm based on modulus and input lengths. Automatically uses specialised squaring code if inputs buffers are identical. */ void zn_array_mul (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, const zn_mod_t mod); /* Middle product of op1[0, n1) and op2[0, n2), stores result in res[0, n1 - n2 + 1). (i.e. this is the subarray of the ordinary product op1 * op2 consisting of those coefficients with indices in the range [n2 - 1, n1).) op1 and op2 may alias each other, but neither may overlap res. Must have n1 >= n2 >= 1. Automatically selects best middle product algorithm based on modulus and input lengths. */ void zn_array_mulmid (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, const zn_mod_t mod); // forward declaration (see zn_poly_internal.h) struct ZNP_zn_array_mulmid_fft_precomp1_struct; /* Stores precomputed information for performing a middle product where the first input array op1[0, n1) is invariant, and the *length* of the second input array op2[0, n2) is invariant. */ typedef struct { // Determines which middle product algorithm we're using. // One of the constants: // ZNP_MULMID_ALGO_FALLBACK: fall back on zn_array_mulmid // ZNP_MULMID_ALGO_FFT: use zn_array_mulmid_fft_precomp1 int algo; size_t n1, n2; const zn_mod_struct* mod; // stores a copy of op1[0, n1) if we're using ZNP_MULMID_ALGO_FALLBACK ulong* op1; // precomputed data if we're using ZNP_MULMID_ALGO_FFT struct ZNP_zn_array_mulmid_fft_precomp1_struct* precomp_fft; } zn_array_mulmid_precomp1_struct; typedef zn_array_mulmid_precomp1_struct zn_array_mulmid_precomp1_t[1]; /* Initialises res to perform middle product of op1[0, n1) by operands of size n2. */ void zn_array_mulmid_precomp1_init (zn_array_mulmid_precomp1_t res, const ulong* op1, size_t n1, size_t n2, const zn_mod_t mod); /* Performs middle product of op1[0, n1) by op2[0, n2), stores result at res[0, n1 - n2 + 1). */ void zn_array_mulmid_precomp1_execute (ulong* res, const ulong* op2, const zn_array_mulmid_precomp1_t precomp); /* Deallocates op. */ void zn_array_mulmid_precomp1_clear (zn_array_mulmid_precomp1_t op); /* Same as zn_array_mul(), but uses the Schonhage/Nussbaumer FFT algorithm, with a few layers of naive DFT to save memory. lgT is the number of layers of DFT. Larger values of lgT save more memory, as long as lgT doesn't get too close to lg2(sqrt(n1 + n2)). Larger values also make the function slower. Probably you never want to make lgT bigger than 4; after that the savings are marginal. The modulus must be odd. Output may *not* overlap inputs. NOTE: this interface is preliminary and may change in future versions. */ void zn_array_mul_fft_dft (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, unsigned lgT, const zn_mod_t mod); /* ============================================================================ polynomial division on raw arrays ============================================================================ */ /* NOTE: this interface is going to *change* in a future version of zn_poly! Computes n terms of power series inverse of op[0, n). Must have n >= 1. Currently restricted to op[0] == 1. Output may not overlap input. */ void zn_array_invert (ulong* res, const ulong* op, size_t n, const zn_mod_t mod); /* ============================================================================ other miscellaneous zn_array stuff ============================================================================ */ /* res := -op Inputs and outputs in [0, m). */ void zn_array_neg (ulong* res, const ulong* op, size_t n, const zn_mod_t mod); /* res := op1 - op2. Inputs and outputs in [0, m). */ void zn_array_sub (ulong* res, const ulong* op1, const ulong* op2, size_t n, const zn_mod_t mod); /* Returns zero if op1[0, n) and op2[0, n) are equal, otherwise nonzero. */ int zn_array_cmp (const ulong* op1, const ulong* op2, size_t n); /* Copies op[0, n) to res[0, n). Buffers must not overlap. */ void zn_array_copy (ulong* res, const ulong* op, size_t n); /* Sets res[0, n) to zero. */ ZNP_INLINE void zn_array_zero (ulong* res, size_t n) { for (; n; n--) *res++ = 0; } #ifdef __cplusplus } #endif #endif // end of file **************************************************************** zn_poly-0.9.2/include/zn_poly_internal.h000066400000000000000000001264611360464557000204220ustar00rootroot00000000000000/* zn_poly_internal.h: main header file #included internally by zn_poly modules Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ /* IMPORTANT NOTE!!!!!! Everything in this file is internal, and may change incompatibly in future versions of zn_poly. You have been warned! */ #ifndef ZN_POLY_INTERNAL_H #define ZN_POLY_INTERNAL_H #include #include #include #include "zn_poly.h" #ifdef __cplusplus extern "C" { #endif #ifdef ZNP_USE_FLINT // use FLINT integer multiplication #include #define ZNP_mpn_mul F_mpn_mul #else // use GMP integer multiplication #define ZNP_mpn_mul mpn_mul #endif /* Executes "stuff", and checks that it returns 0. Usage is e.g.: ZNP_ASSERT_NOCARRY (mpn_add_n (z, x, y, 42)); This is used where we want the test suite to check that the return value is zero, but the production version should skip the check, and we don't want to stuff around with temporaries everywhere. */ #define ZNP_ASSERT_NOCARRY(stuff) \ do { \ mp_limb_t __xyz_cy; \ __xyz_cy = (stuff); \ ZNP_ASSERT (__xyz_cy == 0); \ } while (0) /* For integers a >= 1 and b >= 1, returns ceil(a / b). */ #define CEIL_DIV(a, b) ((((a) - 1) / (b)) + 1) /* For integers a >= 1 and 0 <= r < ULONG BITS, returns ceil(a / 2^r). */ #define CEIL_DIV_2EXP(a, r) ((((a) - 1) >> (r)) + 1) #define ZNP_MIN(aaa, bbb) (((aaa) < (bbb)) ? (aaa) : (bbb)) #define ZNP_MAX(aaa, bbb) (((aaa) > (bbb)) ? (aaa) : (bbb)) /* Estimate of the L1 cache size, in bytes. If this is a bit on the small side, it's probably not a big deal. If it's on the big side, that might start to seriously degrade performance. */ #define ZNP_CACHE_SIZE 32768 /* Returns ceil(log2(x)). x must be >= 1. */ #define ceil_lg \ ZNP_ceil_lg int ceil_lg (ulong x); /* Returns floor(log2(x)). Returns -1 for x == 0. */ #define floor_lg \ ZNP_floor_lg int floor_lg (ulong x); /* res := abs(op1 - op2). Returns 1 if op1 - op2 is negative, else zero. */ #define signed_mpn_sub_n \ ZNP_signed_mpn_sub_n ZNP_INLINE int signed_mpn_sub_n (mp_limb_t* res, const mp_limb_t* op1, const mp_limb_t* op2, size_t n) { if (mpn_cmp (op1, op2, n) >= 0) { mpn_sub_n (res, op1, op2, n); return 0; } else { mpn_sub_n (res, op2, op1, n); return 1; } } /* The ZNP_FASTALLOC and ZNP_FASTFREE macros are used for allocating memory which is taken off the stack if the request is small enough, or off the heap if not. Example usage: ZNP_FASTALLOC (stuff, int, 100, n); This does two things. It allocates an array of 100 ints on the stack. It also declares a pointer "int* stuff", which points to a block of ints of length n. If n <= 100, the block will be the one just allocated on the stack. If n > 100, the block will be found using malloc. Then afterwards, you need to do: ZNP_FASTFREE (stuff); This will call free() if the block was originally taken off the heap. */ #define ZNP_FASTALLOC(ptr, type, reserve, request) \ size_t __FASTALLOC_request_##ptr = (request); \ type* ptr; \ type __FASTALLOC_##ptr [reserve]; \ if (__FASTALLOC_request_##ptr <= (reserve)) \ ptr = __FASTALLOC_##ptr; \ else \ ptr = (type*) malloc (sizeof (type) * __FASTALLOC_request_##ptr); #define ZNP_FASTFREE(ptr) \ if (ptr != __FASTALLOC_##ptr) \ free (ptr); extern size_t ZNP_mpn_smp_kara_thresh; extern size_t ZNP_mpn_mulmid_fallback_thresh; /* Stores tuning data for moduli of a specific bitsize. */ #define tuning_info_t \ ZNP_tuning_info_t typedef struct { // thresholds for array multiplication size_t mul_KS2_thresh; // KS1 -> KS2 threshold size_t mul_KS4_thresh; // KS2 -> KS4 threshold size_t mul_fft_thresh; // KS4 -> fft threshold // as above, but for squaring size_t sqr_KS2_thresh; size_t sqr_KS4_thresh; size_t sqr_fft_thresh; // as above, but for middle products size_t mulmid_KS2_thresh; size_t mulmid_KS4_thresh; size_t mulmid_fft_thresh; // for negacyclic multiplications, switch from KS to Nussbaumer FFT // when length reaches 2^nuss_mul_thresh unsigned nuss_mul_thresh; // ditto for nussbaumer squaring unsigned nuss_sqr_thresh; } tuning_info_t; /* Global array of tuning_info_t's, one for each bitsize. */ #define tuning_info \ ZNP_tuning_info extern tuning_info_t tuning_info[]; /* ============================================================================ stuff from pack.c ============================================================================ */ /* Computes res := 2^k * ( op[0] + op[s]*2^b + ... + op[(n-1)*s]*2^((n-1)*b) ). Assumes each op[i] satisfies 0 <= op[i] < 2^b. Must have 0 < b < 3 * ULONG_BITS. If r == 0, then exactly ceil((k + n * b) / GMP_NUMB_BITS) limbs are written. Otherwise, the output will be zero-padded up to exactly r limbs, which must be at least the above number of limbs. */ #define zn_array_pack \ ZNP_zn_array_pack void zn_array_pack (mp_limb_t* res, const ulong* op, size_t n, ptrdiff_t s, unsigned b, unsigned k, size_t r); /* Let op be an integer of the form 2^k * (a[0] + a[1]*2^b + ... + a[n-1]*2^((n-1)*b)) + junk, where 0 <= a[i] < 2^b for each i, and where 0 <= junk < 2^k. This function reads off the a[i]'s and stores them at res. Each output coefficient occupies exactly ceil(b / ULONG_BITS) words. The input should be exactly ceil((k + n * b) / GMP_NUMB_BITS) limbs long. Must have 0 < b < 3 * ULONG_BITS. */ #define zn_array_unpack \ ZNP_zn_array_unpack void zn_array_unpack (ulong* res, const mp_limb_t* op, size_t n, unsigned b, unsigned k); /* Same as zn_array_unpack, but adds an assertion to check that the unpacking routine will not read beyond the first r limbs of op. */ #define zn_array_unpack_SAFE(res, op, n, b, k, r) \ do \ { \ ZNP_ASSERT((n) * (b) + (k) <= (r) * GMP_NUMB_BITS); \ zn_array_unpack(res, op, n, b, k); \ } while (0) /* ============================================================================ stuff from mul.c ============================================================================ */ /* Identical to zn_array_mul(), except for the fastred flag. If fastred is cleared, the output is the same as for zn_array_mul(). If fastred is set, the routine uses the fastest modular reduction strategy available for the given parameters. The result will come out divided by a fudge factor, which can be recovered via _zn_array_mul_fudge(). */ #define _zn_array_mul \ ZNP__zn_array_mul void _zn_array_mul (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int fastred, const zn_mod_t mod); #define _zn_array_mul_fudge \ ZNP__zn_array_mul_fudge ulong _zn_array_mul_fudge (size_t n1, size_t n2, int sqr, const zn_mod_t mod); /* ============================================================================ stuff from ks_support.c ============================================================================ */ /* Sets res[i * s] = reduction modulo mod of the i-th entry of op, for 0 <= i < n. Each entry of op is w ulongs. If the redc flag is set, the results are divided by -B mod m (only allowed if the modulus is odd). Must have 1 <= w <= 3. */ #define array_reduce \ ZNP_array_reduce void array_reduce (ulong* res, ptrdiff_t s, const ulong* op, size_t n, unsigned w, int redc, const zn_mod_t mod); /* This is a helper function for the variants of KS that evaluate at "reciprocal" evaluation points like 2^(-N); it implements essentially the algorithm of section 3.2 of [Har07], plus reductions mod n. It accepts two integers X and Y written in base M = 2^b, where 1 <= b <= 3 * ULONG_BITS / 2. It assumes that X = a[0] + a[1]*M + ... + a[n-1]*M^(n-1), Y = a[n-1] + a[n-2]*M + ... + a[0]*M^(n-1), where each a[i] is two "digits" long, and where the high digit of a[i] is at most M-2 (i.e. may not equal M-1). It reconstructs the a[i], reduces them mod m, and stores the results in an array. The input is supplied as follows. X is in op1, Y is in op2. They are both arrays of values that are b bits wide, where b <= 3 * ULONG_BITS / 2. Each value takes up one ulong if b <= ULONG_BITS, otherwise two ulongs. There are n + 1 such values in each array (i.e. each array consists of (n + 1) * ceil(b / ULONG_BITS) ulongs). The output (n ulongs) is written to the array res, with consecutive outputs separated by s ulongs. mod describes the modulus m. If the redc flag is set, the modular reductions are performed using REDC, i.e. the result contain an extra factor of -1/B mod m (where B = 2^ULONG_BITS). */ #define zn_array_recover_reduce \ ZNP_zn_array_recover_reduce void zn_array_recover_reduce (ulong* res, ptrdiff_t s, const ulong* op1, const ulong* op2, size_t n, unsigned b, int redc, const zn_mod_t mod); /* ============================================================================ stuff from mul_ks.c ============================================================================ */ /* These are the same as zn_array_mul(). They use four different types of Kronecker substitution. They automatically use a faster algorithm for squaring (if the inputs are identical buffers). Aliasing of all operands allowed. Must have n1 >= n2 >= 1. If the redc flag is set, the outputs will be divided by -B mod m. (Only allowed if the modulus is odd.) */ #define zn_array_mul_KS1 \ ZNP_zn_array_mul_KS1 void zn_array_mul_KS1 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod); #define zn_array_mul_KS2 \ ZNP_zn_array_mul_KS2 void zn_array_mul_KS2 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod); #define zn_array_mul_KS3 \ ZNP_zn_array_mul_KS3 void zn_array_mul_KS3 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod); #define zn_array_mul_KS4 \ ZNP_zn_array_mul_KS4 void zn_array_mul_KS4 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod); /* ============================================================================ stuff from mulmid_ks.c ============================================================================ */ /* These are the same as zn_array_mulmid(). They use four different types of Kronecker substitution. Aliasing of all operands allowed. Must have n1 >= n2 >= 1. If the redc flag is set, the outputs will be divided by -B mod m. (Only allowed if the modulus is odd.) */ #define zn_array_mulmid_KS1 \ ZNP_zn_array_mulmid_KS1 void zn_array_mulmid_KS1 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod); #define zn_array_mulmid_KS2 \ ZNP_zn_array_mulmid_KS2 void zn_array_mulmid_KS2 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod); #define zn_array_mulmid_KS3 \ ZNP_zn_array_mulmid_KS3 void zn_array_mulmid_KS3 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod); #define zn_array_mulmid_KS4 \ ZNP_zn_array_mulmid_KS4 void zn_array_mulmid_KS4 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod); /* ============================================================================ pmf_t stuff ============================================================================ */ /* Let R = Z/mZ. A pmf_t ("pmf" = "polynomial modulo fermat") represents an element of S = R[Y]/(Y^M + 1). This is used as the coefficient ring in the Schonhage/Nussbaumer FFT routines. It's an array of ulongs of length M + 1, where M = 2^lgM is a power of two. (The value M is not stored, the caller needs to keep track of it.) The first value in the array is an integer b called the "bias" of the representation. The remaining M values are coefficients a_0, ..., a_{M-1}. These together represent the polynomial Y^b (a_0 + a_1 Y + ... + a_{M-1} Y^{M-1}). Note that elements of S do not have a unique representation in this form; in fact they have one possible representation for each value of b in [0, 2M). (By allowing nonzero bias, we get more efficient in-place FFT butterflies.) The stored bias value need not be in [0, 2M), but it is interpreted mod 2M. Currently the values a_i are always normalised into [0, m). Later we might drop that restriction to obtain faster butterflies... */ #define pmf_t \ ZNP_pmf_t typedef ulong* pmf_t; #define pmf_const_t \ ZNP_pmf_const_t typedef const ulong* pmf_const_t; /* op := op * x */ #define pmf_scalar_mul \ ZNP_pmf_scalar_mul ZNP_INLINE void pmf_scalar_mul (pmf_t op, ulong M, ulong x, const zn_mod_t mod) { zn_array_scalar_mul (op + 1, op + 1, M, x, mod); } /* op := 0, with bias reset to zero too */ #define pmf_zero \ ZNP_pmf_zero ZNP_INLINE void pmf_zero (pmf_t op, ulong M) { for (M++; M > 0; M--) *op++ = 0; } /* op := op / 2 Modulus must be odd. */ #define pmf_divby2 \ ZNP_pmf_divby2 ZNP_INLINE void pmf_divby2 (pmf_t op, ulong M, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); for (op++; M > 0; M--, op++) *op = zn_mod_divby2 (*op, mod); } /* res := op */ #define pmf_set \ ZNP_pmf_set ZNP_INLINE void pmf_set (pmf_t res, pmf_t op, ulong M) { for (M++; M > 0; M--) *res++ = *op++; } /* op := Y^r * op */ #define pmf_rotate \ ZNP_pmf_rotate ZNP_INLINE void pmf_rotate (pmf_t op, ulong r) { op[0] += r; } /* op1 := op2 + op1 op2 := op2 - op1 Inputs must be [0, m); outputs will be in [0, m). */ #define pmf_bfly \ ZNP_pmf_bfly void pmf_bfly (pmf_t op1, pmf_t op2, ulong M, const zn_mod_t mod); /* op1 := op1 + op2 Inputs must be [0, m); outputs will be in [0, m). */ #define pmf_add \ ZNP_pmf_add void pmf_add (pmf_t op1, const pmf_t op2, ulong M, const zn_mod_t mod); /* op1 := op1 - op2 Inputs must be [0, m); outputs will be in [0, m). */ #define pmf_sub \ ZNP_pmf_sub void pmf_sub (pmf_t op1, const pmf_t op2, ulong M, const zn_mod_t mod); /* These functions are exported just for profiling purposes: */ #define zn_array_bfly_inplace \ ZNP_zn_array_bfly_inplace void zn_array_bfly_inplace (ulong* op1, ulong* op2, ulong n, const zn_mod_t mod); #define zn_array_add_inplace \ ZNP_zn_array_add_inplace void zn_array_add_inplace (ulong* op1, const ulong* op2, ulong n, const zn_mod_t mod); #define zn_array_sub_inplace \ ZNP_zn_array_sub_inplace void zn_array_sub_inplace (ulong* op1, const ulong* op2, ulong n, const zn_mod_t mod); /* ============================================================================ pmfvec_t stuff ============================================================================ */ /* A pmfvec_t stores a vector of length K = 2^lgK of elements of S. Used to represent an element of S[Z]/(Z^K + 1) or S[Z]/(Z^K - 1), or some other quotient like that. The functions pmfvec_init/clear should be used to allocate storage for this type. Also sometimes fake ones get created temporarily to point at sub-vectors of existing vectors. */ #define pmfvec_struct \ ZNP_pmfvec_struct typedef struct { // points to the first coefficient pmf_t data; // number of coefficients ulong K; unsigned lgK; // lg2(K) // length of coefficients (see definition of pmf_t) ulong M; unsigned lgM; // lg2(M) // distance between adjacent coefficients, measured in ulongs // (this is at least M + 1, might be more) ptrdiff_t skip; // associated modulus const zn_mod_struct* mod; } pmfvec_struct; #define pmfvec_t \ ZNP_pmfvec_t typedef pmfvec_struct pmfvec_t[1]; /* Checks that vec1 and vec2 have compatible data, i.e. have the same K, M, mod. */ #define pmfvec_compatible \ ZNP_pmfvec_compatible ZNP_INLINE int pmfvec_compatible (const pmfvec_t vec1, const pmfvec_t vec2) { return (vec1->K == vec2->K) && (vec1->M == vec2->M) && (vec1->mod == vec2->mod); } /* Initialises res with given parameters, allocates memory. */ #define pmfvec_init \ ZNP_pmfvec_init void pmfvec_init (pmfvec_t res, unsigned lgK, ptrdiff_t skip, unsigned lgM, const zn_mod_t mod); /* Initialises res in preparation for a Nussbaumer multiplication of length 2^lgL. */ #define pmfvec_init_nuss \ ZNP_pmfvec_init_nuss void pmfvec_init_nuss (pmfvec_t res, unsigned lgL, const zn_mod_t mod); /* Destroys op, frees all associated memory. */ #define pmfvec_clear \ ZNP_pmfvec_clear void pmfvec_clear (pmfvec_t op); /* res := op */ #define pmfvec_set \ ZNP_pmfvec_set void pmfvec_set (pmfvec_t res, const pmfvec_t op); /* Multiplies first n coefficients of op by x. */ #define pmfvec_scalar_mul \ ZNP_pmfvec_scalar_mul void pmfvec_scalar_mul (pmfvec_t op, ulong n, ulong x); /* Multiplies pointwise the first n coefficients of op1 and op2, puts result in res. It's okay for res to alias op1 or op2. The modulus must be odd. If the special_first_two flag is set, the routine assumes that the first two coefficients are of length only M/2 (this is the typical situation after performing the FFT), and multiplies them more quickly accordingly. The routine automatically selects KS or Nussbaumer multiplication depending on the modulus bitsize and on M. The output will be divided by a fudge factor, which can be retrieved via pmfvec_mul_fudge(). Automatically uses specialised squaring algorithm if the inputs are the same pmfvec_t object. */ #define pmfvec_mul \ ZNP_pmfvec_mul void pmfvec_mul (pmfvec_t res, const pmfvec_t op1, const pmfvec_t op2, ulong n, int special_first_two); #define pmfvec_mul_fudge \ ZNP_pmfvec_mul_fudge ulong pmfvec_mul_fudge (unsigned lgM, int sqr, const zn_mod_t mod); /* Modifies the op->data and op->skip to make it look as if the first coefficient is the one at index n - 1, and the last coefficient is the one at index 0. Calling this function again undoes the reversal. Note that this function *must* be called a second time before calling pmfvec_clear(), so that free() is not called on the wrong pointer! */ #define pmfvec_reverse \ ZNP_pmfvec_reverse void pmfvec_reverse (pmfvec_t op, ulong n); /* ============================================================================ stuff in pmfvec_fft.c ============================================================================ */ /* ---------- forward FFTs ---------- The functions pmfvec_fft() pmfvec_fft_basecase() pmfvec_fft_dc() pmfvec_fft_huge() operate on a pmfvec_t, and compute inplace the FFT: b_k = Y^{t k'} \sum_{i=0}^{K-1} Y^{2 M i k' / K} a_i, where 0 <= t < 2M / K is a twist parameter. The notation k' indicates the length lgK bit-reversal of k. All except the "basecase" version have an n parameter; they only compute the first n outputs. The remaining buffers are used in intermediate steps, and contain junk at the end. For "basecase", all K outputs are computed. All except the "basecase" version have a z parameter; they assume that the input coefficients are zero from index z and beyond. They never read from those coefficients. For "basecase", all the inputs are used. The four versions use different algorithms as follows: * pmfvec_fft(): main entry point, delegates to one of the other routines based on the size of the transform. * pmfvec_fft_basecase(): low-overhead iterative FFT, no truncation logic. * pmfvec_fft_dc(): divide-and-conquer. It handles the top layer of butterflies, and then recurses into the two halves. This is intended for fairly small transforms where locality is not a big issue. The algorithm implemented here is essentially van der Hoeven's "truncated Fourier transform" [vdH04], [vdH05]. * pmfvec_fft_huge(): intended for large transforms, where locality is an issue. It factors the FFT into U = 2^lgU transforms of length T = 2^lgT followed by T transforms of length U, where K = T * U. This is done recursively until we're in L1 cache (or as small as possible), at which point we switch to pmfvec_fft_dc(). The algorithm is straightforward, but I believe it to be new. It is simultaneously a generalisation of van der Hoeven's truncated Fourier transform and Bailey's FFT algorithm [Bai89]. (I used something similar in the ZmodF_poly module in FLINT.) ---------- inverse FFTs ---------- The functions pmfvec_ifft() pmfvec_ifft_basecase() pmfvec_ifft_dc() pmfvec_ifft_huge() compute the corresponding inverse FFT. They are a little more complicated than the FFTs owing to the truncation. Let a_i and b_k be as described above for the FFTs. The IFFT functions take as input the array b_{0'}, b_{1'}, ..., b_{(n-1)'}, K*a_n, K*a_{n+1}, ..., K*a_{K-1}, for some 0 <= n <= K. If the fwd flag is zero, then the output of the IFFT routine is: K*a_0, K*a_1, ..., K*a_{n-1} followed by K - n junk coefficients. If the fwd flag is set, then we require that 0 <= n <= K - 1, and the output is K*a_0, K*a_1, ..., K*a_{n-1}, b_n followed by K - n - 1 junk coefficients; i.e. it also computes one coefficient of the *forward* FFT. The "basecase" version has no n parameter, and assumes that n = K. Here the routine becomes the (non-truncated) IFFT as usually understood, with inputs in bit-reversed order and outputs in normal order. All except the "basecase" version have a z parameter (with z >= n); they assume that the input coefficients are zero from index z and beyond. They never read from those coefficients. For the "basecase" version, all of the inputs are used. The four versions use different algorithms as follows: * pmfvec_ifft(): main entry point, delegates to one of the other routines based on the size of the transform. * pmfvec_ifft_basecase(): low-overhead iterative IFFT, no truncation logic. * pmfvec_ifft_dc(): divide-and-conquer. It recurses into the two halves, and handles the top layer of butterflies. This is intended for fairly small transforms where locality is not a big issue. The algorithm implemented here is essentially van der Hoeven's beautiful "truncated inverse Fourier transform". * pmfvec_ifft_huge(): intended for large transforms, where locality is an issue. It factors the FFT into U = 2^lgU transforms of length T = 2^lgT and T transforms of length U, where K = T * U. This is done recursively until we're in L1 cache (if possible), at which point we switch to pmfvec_ifft_dc(). The algorithm is not as simple as the FFT version; it is necessary to alternate between "row" and "column" transforms in a slightly complicated way. I believe the algorithm to be new. ---------- transposed forward and inverse FFTs ---------- The functions pmfvec_tpfft_basecase() pmfvec_tpfft_dc() pmfvec_tpfft_huge() pmfvec_tpfft() are *transposed* versions of the corresponding pmfvec_fft() routines. This means that if the FFT computes an S-linear map from S^z to S^n, the transposed version computes the transpose map from S^n to S^z. Similarly, the functions pmfvec_tpifft_basecase() pmfvec_tpifft_dc() pmfvec_tpifft_huge() pmfvec_tpifft() are transposed versions of the IFFT routines. If the IFFT computes an S-linear map from S^z to S^(n + fwd), the transposed version computes the transpose map from S^(n + fwd) to S^z. The algorithms are transposed essentially by reversing them, and transposing every step of the algorithm; see for example [BLS03] for how this is done. We don't have comments in these routines; see the comments on the corresponding FFT/IFFT routines. ---------- notes for all the above functions ---------- These functions all perform O(n * lgK + K) operations in S (an "operation" being an addition/subtraction/copy with possibly an implied rotation by a power of Y). In particular the running time varies fairly smoothly with n instead of jumping with K. For all four algorithms, the "dc" version is essentially equivalent to the "huge" version with lgT = 1. These functions are not thread-safe. Apart from modifying the input inplace, they also temporarily modify the pmfvec_t structs themselves. Our approach to improving cache performance is certainly not ideal. The main problem is that we can get address conflicts, especially since everything gets spread out by powers of two. Mitigating factors: associativity in the cache; the extra bias word scrambles the addresses somewhat; when the transforms gets large, so do the coefficients, so we don't expect to fit that many in cache anyway. */ #define pmfvec_fft \ ZNP_pmfvec_fft void pmfvec_fft (pmfvec_t op, ulong n, ulong z, ulong t); #define pmfvec_fft_huge \ ZNP_pmfvec_fft_huge void pmfvec_fft_huge (pmfvec_t op, unsigned lgT, ulong n, ulong z, ulong t); #define pmfvec_fft_dc \ ZNP_pmfvec_fft_dc void pmfvec_fft_dc (pmfvec_t op, ulong n, ulong z, ulong t); #define pmfvec_fft_basecase \ ZNP_pmfvec_fft_basecase void pmfvec_fft_basecase (pmfvec_t op, ulong t); #define pmfvec_ifft \ ZNP_pmfvec_ifft void pmfvec_ifft (pmfvec_t op, ulong n, int fwd, ulong z, ulong t); #define pmfvec_ifft_huge \ ZNP_pmfvec_ifft_huge void pmfvec_ifft_huge (pmfvec_t op, unsigned lgT, ulong n, int fwd, ulong z, ulong t); #define pmfvec_ifft_dc \ ZNP_pmfvec_ifft_dc void pmfvec_ifft_dc (pmfvec_t op, ulong n, int fwd, ulong z, ulong t); #define pmfvec_ifft_basecase \ ZNP_pmfvec_ifft_basecase void pmfvec_ifft_basecase (pmfvec_t op, ulong t); #define pmfvec_tpfft \ ZNP_pmfvec_tpfft void pmfvec_tpfft (pmfvec_t op, ulong n, ulong z, ulong t); #define pmfvec_tpfft_huge \ ZNP_pmfvec_tpfft_huge void pmfvec_tpfft_huge (pmfvec_t op, unsigned lgT, ulong n, ulong z, ulong t); #define pmfvec_tpfft_dc \ ZNP_pmfvec_tpfft_dc void pmfvec_tpfft_dc (pmfvec_t op, ulong n, ulong z, ulong t); #define pmfvec_tpfft_basecase \ ZNP_pmfvec_tpfft_basecase void pmfvec_tpfft_basecase (pmfvec_t op, ulong t); #define pmfvec_tpifft \ ZNP_pmfvec_tpifft void pmfvec_tpifft (pmfvec_t op, ulong n, int fwd, ulong z, ulong t); #define pmfvec_tpifft_huge \ ZNP_pmfvec_tpifft_huge void pmfvec_tpifft_huge (pmfvec_t op, unsigned lgT, ulong n, int fwd, ulong z, ulong t); #define pmfvec_tpifft_dc \ ZNP_pmfvec_tpifft_dc void pmfvec_tpifft_dc (pmfvec_t op, ulong n, int fwd, ulong z, ulong t); #define pmfvec_tpifft_basecase \ ZNP_pmfvec_tpifft_basecase void pmfvec_tpifft_basecase (pmfvec_t op, ulong t); /* ============================================================================ stuff in array.c ============================================================================ */ /* Computes res = sign1*op1 + sign2*op2, where sign1 = -1 if neg1 is set, otherwise +1; ditto for sign2. op1 and op2 are arrays of length n. res is a staggered array, entries separated by s. Return value is res + s*n, i.e. points beyond the written array. */ #define zn_skip_array_signed_add \ ZNP_zn_skip_array_signed_add ulong* zn_skip_array_signed_add (ulong* res, ptrdiff_t skip, size_t n, const ulong* op1, int neg1, const ulong* op2, int neg2, const zn_mod_t mod); /* Same as zn_array_scalar_mul, but has a _redc_ flag. If the flag is set, then REDC reduction is used (in which case the modulus must be odd), otherwise ordinary reduction is used. */ #define _zn_array_scalar_mul \ ZNP__zn_array_scalar_mul void _zn_array_scalar_mul (ulong* res, const ulong* op, size_t n, ulong x, int redc, const zn_mod_t mod); /* Behaves just like zn_array_scalar_mul, except it uses the obvious optimisation if x == 1. */ #define zn_array_scalar_mul_or_copy \ ZNP_zn_array_scalar_mul_or_copy void zn_array_scalar_mul_or_copy (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod); /* ============================================================================ stuff in nuss.c ============================================================================ */ /* Performs negacyclic multiplication using Nussbaumer's algorithm. vec1 and vec2 must be pre-initialised pmfvec_t's with the same modulus and the same lgM and lgK, satisfying lgM + 1 >= lgK (i.e. there are enough roots of unity). These are used for scratch space. The convolution length is L = 2^lgL, where lgL = lgM + lgK - 1. Inputs are op1[0, L) and op2[0, L), output is res[0, L). It's okay for res to alias op1 or op2. If op1 == op2, then a faster squaring version is used. In this case vec2 is ignored. The result comes out divided by a fudge factor, which can be recovered via nuss_mul_fudge(). */ #define nuss_mul \ ZNP_nuss_mul void nuss_mul (ulong* res, const ulong* op1, const ulong* op2, pmfvec_t vec1, pmfvec_t vec2); #define nuss_mul_fudge \ ZNP_nuss_mul_fudge ulong nuss_mul_fudge (unsigned lgL, int sqr, const zn_mod_t mod); /* Computes optimal lgK and lgM for given lgL, as described above for nuss_mul(). */ #define nuss_params \ ZNP_nuss_params void nuss_params (unsigned* lgK, unsigned* lgM, unsigned lgL); /* ============================================================================ stuff from mul_fft.c ============================================================================ */ /* Splits op[-k, n) into pieces of length M/2, where M = res->M, and where the first k coefficients are assumed to be zero. The pieces are written to the first ceil((n + k) / (M/2)) coefficients of res. The last fractional piece is treated as if zero-padded up to length M/2. The second half of each target pmf_t is zeroed out, and the bias fields are all set to b. If x != 1, then all entries are multiplied by x mod m. */ #define fft_split \ ZNP_fft_split void fft_split (pmfvec_t res, const ulong* op, size_t n, size_t k, ulong x, ulong b); /* Performs the substitution back from S[Z]/(Z^K - 1) to a polynomial in X, i.e. mapping Y -> X, Z -> X^(M/2). It only looks at the first z coefficients of op; it assumes the rest are zero. It writes exactly n coefficients of output. If skip_first is set, it ignores the first M/2 coefficients of output, and then writes the *next* M/2 coefficients of output. NOTE: this routine is not threadsafe: it temporarily modifies the bias field of the first coefficient of op. */ #define fft_combine \ ZNP_fft_combine void fft_combine (ulong* res, size_t n, const pmfvec_t op, ulong z, int skip_first); /* Same as zn_array_mul(), but uses the Schonhage/Nussbaumer FFT algorithm. Uses faster algorithm for squaring if inputs are identical buffers. The modulus must be odd. Output may overlap the inputs. The output will come out divided by a fudge factor, which can be recovered via zn_array_mul_fft_fudge(). If x != 1, the output is further multiplied by x. */ #define zn_array_mul_fft \ ZNP_zn_array_mul_fft void zn_array_mul_fft (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, ulong x, const zn_mod_t mod); #define zn_array_mul_fft_fudge \ ZNP_zn_array_mul_fft_fudge ulong zn_array_mul_fft_fudge (size_t n1, size_t n2, int sqr, const zn_mod_t mod); /* Computes the best lgK, lgM, m1, m2 such that polynomials of length n1 and n2 may be multiplied with fourier transform parameters lgK and lgM, and where the polynomials get split into m1 (resp. m2) chunks of length M/2. More precisely, the outputs satisfy: * lgM + 1 >= lgK (i.e. there are enough roots of unity) * m1 + m2 - 1 <= K, where m1 = ceil(n1 / (M/2)) and m2 = ceil(n2 / (M/2)) (i.e. the transform has enough room for the answer) * lgM >= 1 * lgM is minimal subject to the above conditions. */ #define mul_fft_params \ ZNP_mul_fft_params void mul_fft_params (unsigned* lgK, unsigned* lgM, ulong* m1, ulong* m2, size_t n1, size_t n2); /* Same as zn_array_mulmid(), but uses the Schonhage/Nussbaumer FFT algorithm. The modulus must be odd. Output may overlap the inputs. The output will come out divided by a fudge factor, which can be recovered via zn_array_mulmid_fft_fudge(). If x != 1, the output is further multiplied by x. */ #define zn_array_mulmid_fft \ ZNP_zn_array_mulmid_fft void zn_array_mulmid_fft (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, ulong x, const zn_mod_t mod); #define zn_array_mulmid_fft_fudge \ ZNP_zn_array_mulmid_fft_fudge ulong zn_array_mulmid_fft_fudge (size_t n1, size_t n2, const zn_mod_t mod); /* Computes the best lgK, lgM, m1, m2, p such that the middle product of polynomials of length n1 and n2 may be computed with fourier transform parameters lgK and lgM, where the first polynomial is padded on the left by p zeroes, and where the polynomials get split into m1 (resp. m2) chunks of length M/2. More precisely, the outputs satisfy (see mul_fft.c for further discussion): * lgM >= 1 * lgM + 1 >= lgK (i.e. there are enough roots of unity) * 1 <= p <= M/2 and n2 + p - 1 is divisible by M/2 * m1 = ceil((n1 + p) / (M/2)) * m2 = ceil(n2 / (M/2)) * m1 <= K * lgM is minimal subject to the above conditions. */ #define mulmid_fft_params \ ZNP_mulmid_fft_params void mulmid_fft_params (unsigned* lgK, unsigned* lgM, ulong* m1, ulong* m2, ulong* p, size_t n1, size_t n2); /* Stores precomputed information for performing an FFT-based middle product where the first input array is invariant, and the length of the second input array is invariant. */ #define zn_array_mulmid_fft_precomp1_struct \ ZNP_zn_array_mulmid_fft_precomp1_struct struct zn_array_mulmid_fft_precomp1_struct { // these parameters are as described at the top of mul_fft.c. size_t n1, n2; ulong m1, m2, p; // holds the transposed IFFT of the input array pmfvec_t vec1; }; #define zn_array_mulmid_fft_precomp1_t \ ZNP_zn_array_mulmid_fft_precomp1_t typedef struct zn_array_mulmid_fft_precomp1_struct zn_array_mulmid_fft_precomp1_t[1]; /* Initialises res to perform middle product of op1[0, n1) by operands of size n2. If x != 1, the data is multiplied by x. Since middle products are linear, this has the effect of multiplying the output of subsequent calls to zn_array_mulmid_fft_precomp1_execute() by x. */ #define zn_array_mulmid_fft_precomp1_init \ ZNP_zn_array_mulmid_fft_precomp1_init void zn_array_mulmid_fft_precomp1_init (zn_array_mulmid_fft_precomp1_t res, const ulong* op1, size_t n1, size_t n2, ulong x, const zn_mod_t mod); /* Performs middle product of op1[0, n1) by op2[0, n2), stores result at res[0, n1 - n2 + 1). The output will come out divided by a fudge factor, which can be recovered via zn_array_mulmid_fft_precomp1_fudge(). If x != 1, the output is further multiplied by x. */ #define zn_array_mulmid_fft_precomp1_execute \ ZNP_zn_array_mulmid_fft_precomp1_execute void zn_array_mulmid_fft_precomp1_execute (ulong* res, const ulong* op2, ulong x, const zn_array_mulmid_fft_precomp1_t precomp); #define zn_array_mulmid_fft_precomp1_fudge \ ZNP_zn_array_mulmid_fft_precomp1_fudge ulong zn_array_mulmid_fft_precomp1_fudge (size_t n1, size_t n2, const zn_mod_t mod); /* Deallocates op. */ #define zn_array_mulmid_fft_precomp1_clear \ ZNP_zn_array_mulmid_fft_precomp1_clear void zn_array_mulmid_fft_precomp1_clear (zn_array_mulmid_fft_precomp1_t op); /* ============================================================================ stuff from mpn_mulmid.c ============================================================================ */ /* Let n1 >= n2 >= 1, and let a = \sum_{i=0}^{n1-1} a_i B^i b = \sum_{j=0}^{n2-1} b_j B^j be integers with n1 and n2 limbs respectively. We define SMP(a, b), the *simple* middle product of a and b, to be the integer \sum_{0 <= i < n1, 0 <= j < n2, n2-1 <= i+j < n1} a_i b_j B^(i+j-(n2-1)). In other words, it's as if we treat a and b as polynomials in Z[B] of length n1 and n2 respectively, compute the polynomial middle product over Z, and then propagate the high words and subsequent carries. Note that SMP(a, b) is at most n1 - n2 + 3 limbs long (we assume throughout that n1 is less than the maximum value stored in a limb). */ /* Computes SMP(op1[0, n1), op2[0, n2)). Stores result at res[0, n1 - n2 + 3). */ void ZNP_mpn_smp (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2); /* Same as mpn_smp(), but always uses basecase (quadratic-time) algorithm. res[0, n1 - n2 + 3) must not overlap op1[0, n1) or op2[0, n2). */ void ZNP_mpn_smp_basecase (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2); /* Computes SMP(op1[0, 2*n - 1), op2[0, n)). Algorithm is selected depending on size of n. Stores result at res[0, n + 2). Output must not overlap inputs. */ void ZNP_mpn_smp_n (mp_limb_t* res, const mp_limb_t* op1, const mp_limb_t* op2, size_t n); /* Computes SMP(op1[0, 2*n - 1), op2[0, n)), using Karatsuba algorithm. Must have n >= 2. Stores result at res[0, n + 2). Output must not overlap inputs. */ void ZNP_mpn_smp_kara (mp_limb_t* res, const mp_limb_t* op1, const mp_limb_t* op2, size_t n); /* Computes the *true* middle product of op1[0, n1) and op2[0, n2). More precisely, let P be the product op1 * op2. This function computes P[n2 + 1, n1), and stores this at res[2, n1 - n2 + 1). Must have n1 >= n2 >= 1. The output buffer res *must* have room for n1 - n2 + 3 limbs, but the first two limbs and the last two limbs of the output will be *garbage*. In particular, only n1 - n2 - 1 limbs of useful output are produced. If n1 <= n2 + 1, then no useful output is produced. */ void ZNP_mpn_mulmid (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2); /* Same as mpn_mulmid, but always just falls back on using mpn_mul. */ void ZNP_mpn_mulmid_fallback (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2); /* ============================================================================ stuff from mulmid.c ============================================================================ */ /* Same as zn_array_mulmid(), but always falls back on doing the full product via _zn_array_mul(). If fastred is cleared, the output is the same as for zn_array_mulmid(). If fastred is set, the routine uses the fastest modular reduction strategy available for the given parameters. The result will come out divided by a fudge factor, which can be recovered via _zn_array_mulmid_fallback_fudge(). */ #define zn_array_mulmid_fallback \ ZNP_zn_array_mulmid_fallback void zn_array_mulmid_fallback (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int fastred, const zn_mod_t mod); #define zn_array_mulmid_fallback_fudge \ ZNP_zn_array_mulmid_fallback_fudge ulong zn_array_mulmid_fallback_fudge (size_t n1, size_t n2, const zn_mod_t mod); /* Identical to zn_array_mulmid(), except for the fastred flag. If fastred is cleared, the output is the same as for zn_array_mulmid(). If fastred is set, the routine uses the fastest modular reduction strategy available for the given parameters. The result will come out divided by a fudge factor, which can be recovered via _zn_array_mulmid_fudge(). */ #define _zn_array_mulmid \ ZNP__zn_array_mulmid void _zn_array_mulmid (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int fastred, const zn_mod_t mod); #define _zn_array_mulmid_fudge \ ZNP__zn_array_mulmid_fudge ulong _zn_array_mulmid_fudge (size_t n1, size_t n2, const zn_mod_t mod); /* ============================================================================ other random stuff ============================================================================ */ /* Compute 2^k mod m. Modulus must be odd. Must have -ULONG_BITS < k < ULONG_BITS. */ #define zn_mod_pow2 \ ZNP_zn_mod_pow2 ulong zn_mod_pow2 (int k, const zn_mod_t mod); /* Constants describing algorithms for precomputed middle products */ #define ZNP_MULMID_ALGO_FALLBACK 0 #define ZNP_MULMID_ALGO_KS 1 #define ZNP_MULMID_ALGO_FFT 2 #ifdef __cplusplus } #endif #endif // end of file **************************************************************** zn_poly-0.9.2/makemakefile.py000066400000000000000000000271221360464557000162170ustar00rootroot00000000000000# # This python script is called by configure to generate the makefile # for zn_poly. # # Patched for Sage's 'spkg-install' to # - respect the environment settings of CC, CPP, CXX, AR and RANLIB, and use # these as well as CFLAGS, CPPFLAGS, CXXFLAGS and LDFLAGS with their usual # meaning (i.e., CXX and not CPP for the C++ compiler, likewise CXXFLAGS # instead of CPPFLAGS, CPPFLAGS for C preprocessor flags, and LDFLAGS in # every link command using the compiler driver); # - support passing CPPFLAGS and CXXFLAGS (via '--cppflags=...' and # '--cxxflags=...'); # - build a 64-bit shared library (.dylib) on Darwin / MacOS X by adding the # target 'libzn_poly.dylib64'. (This is meanwhile superfluous, since # LDFLAGS are used in the receipt for libzn_poly.dylib as well.) # - support a variable SONAME_FLAG (defaulting to '-soname', otherwise taken # from the environment (e.g. '-h' for the Sun linker); # - support a variable SHARED_FLAG (defaulting to '-shared'), which could # later be used to unify the .so and .dylib targets. (An SO_EXTENSION # variable isn't supported / used yet.) with open('VERSION') as fobj: version = fobj.read().strip() with open('ABI_VERSION') as fobj: abi_version = fobj.read().strip() # -------------------------------------------------------------------------- # various lists of modules # These are the modules that go into the actual zn_poly library. They get # compiled in both optimised and debug modes. lib_modules = ["array", "invert", "ks_support", "mulmid", "mulmid_ks", "misc", "mpn_mulmid", "mul", "mul_fft", "mul_fft_dft", "mul_ks", "nuss", "pack", "pmf", "pmfvec_fft", "tuning", "zn_mod"] lib_modules = ["src/" + x for x in lib_modules] # These are modules containing various test routines. They get compiled # in debug mode only. test_modules = ["test", "ref_mul", "invert-test", "pmfvec_fft-test", "mulmid_ks-test", "mpn_mulmid-test", "mul_fft-test", "mul_ks-test", "nuss-test", "pack-test"] test_modules = ["test/" + x for x in test_modules] # These are modules containing various profiling routines. They get compiled # in optimised mode only. prof_modules = ["prof_main", "profiler", "array-profile", "invert-profile", "mulmid-profile", "mpn_mulmid-profile", "mul-profile", "negamul-profile"] prof_modules = ["profile/" + x for x in prof_modules] # These are modules containing profiling routines for NTL. They get compiled # in optimised mode only, with the C++ compiler. cpp_prof_modules = ["profile/ntl-profile"] # These are modules containing dummy routines replacing the NTL ones, if # we're not compiling with NTL support. noncpp_prof_modules = ["profile/ntl-profile-dummy"] # These are modules shared by the test and profiling code. They get compiled # in both debug and optimised mode. testprof_modules = ["test/support"] # Profiling targets. Each X has a main file "X-main.c" which is linked # against prof_main.c. They are compiled once with PROFILE_NTL defined # and once without. prof_progs = ["array-profile", "invert-profile", "mulmid-profile", "mpn_mulmid-profile", "mul-profile", "negamul-profile"] prof_progs = ["profile/" + x for x in prof_progs] # These are modules used in the tuning program; they get compiled only in # optimised mode. tune_modules = ["tune", "mulmid-tune", "mpn_mulmid-tune", "mul-tune", "mul_ks-tune", "mulmid_ks-tune", "nuss-tune", "tuning"] tune_modules = ["tune/" + x for x in tune_modules] # Demo programs. demo_progs = ["demo/bernoulli/bernoulli"] # These are the headers that need to be copied to the install location. install_headers = ["include/zn_poly.h", "include/wide_arith.h"] # These are the other headers. other_headers = ["include/support.h", "include/profiler.h", "include/zn_poly_internal.h"] # -------------------------------------------------------------------------- # read command line options from optparse import OptionParser parser = OptionParser() parser.add_option("--prefix", dest="prefix", default="/usr/local") parser.add_option("--cflags", dest="cflags", default="-O2") parser.add_option("--cppflags", dest="cppflags", default="") parser.add_option("--cxxflags", dest="cxxflags", default="-O2") parser.add_option("--ldflags", dest="ldflags", default="") parser.add_option("--gmp-prefix", dest="gmp_prefix", default="/usr/local") parser.add_option("--ntl-prefix", dest="ntl_prefix", default="/usr/local") parser.add_option("--use-flint", dest="use_flint", action="store_true", default=False) parser.add_option("--flint-prefix", dest="flint_prefix", default="/usr/local") parser.add_option("--enable-tuning", dest="tuning", action="store_true", default=True) parser.add_option("--disable-tuning", dest="tuning", action="store_false") options, args = parser.parse_args() gmp_include_dir = options.gmp_prefix + "/include" gmp_lib_dir = options.gmp_prefix + "/lib" ntl_include_dir = options.ntl_prefix + "/include" ntl_lib_dir = options.ntl_prefix + "/lib" if options.use_flint: flint_include_dir = options.flint_prefix + "/include" flint_lib_dir = options.flint_prefix + "/lib" cflags = options.cflags cppflags = options.cppflags # C preprocessor flags cxxflags = options.cxxflags # C++ compiler flags ldflags = options.ldflags prefix = options.prefix zn_poly_tuning = options.tuning # Note: This should be put into / added to cppflags: includes = "-I" + gmp_include_dir + " -I./include" # Note: This should be put into / added to ldflags: libs = "-L" + gmp_lib_dir + " -lgmp -lm" if options.use_flint: includes = includes + " -I" + flint_include_dir libs = libs + " -L" + flint_lib_dir + " -lflint" cflags = cflags + " -std=c99 -DZNP_USE_FLINT" # Note: These also belong to CPPFLAGS and LDFLAGS, respectively: # (But we currently don't use NTL in Sage's zn_poly installation anyway.) cpp_includes = includes + " -I" + ntl_include_dir cpp_libs = libs + " -L" + ntl_lib_dir + " -lntl" # -------------------------------------------------------------------------- # generate the makefile import time now = time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime()) print( """# # Do not edit directly -- this file was auto-generated # by {0} on {1} # # (makemakefile.py patched for Sage, 04/2012) """.format(__file__, now)) print( """ CC ?= gcc CPP ?= cpp CFLAGS = {0} CPPFLAGS = {1} LDFLAGS = {2} INCLUDES = {3} # These are options to the C preprocessor. LIBS = {4} # These are linker options passed to the compiler. AR ?= ar RANLIB ?= ranlib SHARED_FLAG ?= -shared SONAME_FLAG ?= -soname # '-h' for the Sun/Solaris linker """.format(cflags, cppflags, ldflags, includes, libs)) print( """CXX ?= g++ # The C++ compiler. CXXFLAGS = {0} # Options passed to the C++ compiler. CPP_INCLUDES = {1} CPP_LIBS = {2} """.format(cxxflags, cpp_includes, cpp_libs)) print( """HEADERS = {0} LIBOBJS = {1} TESTOBJS = {2} PROFOBJS = {3} CPP_PROFOBJS = {4} TUNEOBJS = {5}""".format( " ".join(install_headers + other_headers), " ".join([x + ".o" for x in lib_modules]), " ".join([x + "-DEBUG.o" for x in lib_modules + test_modules + testprof_modules]), " ".join([x + ".o" for x in lib_modules + prof_modules + noncpp_prof_modules + testprof_modules]), " ".join([x + ".o" for x in lib_modules + prof_modules + cpp_prof_modules + testprof_modules]), " ".join([x + ".o" for x in lib_modules + tune_modules + testprof_modules + prof_modules + noncpp_prof_modules if x not in ("src/tuning", "profile/prof_main")]) )) print( """ZN_POLY_TUNING = {0} ZN_POLY_VERSION = {1} ZN_POLY_ABI_VERSION = {2} """.format(int(zn_poly_tuning), version, abi_version)) print( """all: libzn_poly.a test: test/test tune: tune/tune check: test \ttest/test -quick all install: \tmkdir -p {prefix}/include/zn_poly \tmkdir -p {prefix}/lib \tcp libzn_poly.a {prefix}/lib \tcp include/zn_poly.h {prefix}/include/zn_poly \tcp include/wide_arith.h {prefix}/include/zn_poly """.format(prefix=prefix)) print( """clean: \trm -f *.o \trm -f test/*.o \trm -f profile/*.o \trm -f tune/*.o \trm -f src/tuning.c \trm -f src/*.o \trm -f demo/bernoulli/*.o \trm -f libzn_poly.a \trm -f libzn_poly.dylib \trm -f libzn_poly*.so* \trm -f libzn_poly*.dll.a \trm -f cygzn_poly.dll \trm -f test/test \trm -f tune/tune""") for x in prof_progs: print("\trm -f " + x) print("\trm -f " + x + "-ntl") for x in demo_progs: print("\trm -f " + x) print( """ distclean: clean \trm -f makefile dist: distclean \ttar --exclude-vcs --exclude=.gitignore -czf zn_poly-$(ZN_POLY_VERSION).tar.gz * ##### library targets ifeq ($(ZN_POLY_TUNING), 1) src/tuning.c: tune/tune \ttune/tune > src/tuning.c else src/tuning.c: tune/tuning.c \tcp tune/tuning.c src/tuning.c endif libzn_poly.a: $(LIBOBJS) \t$(AR) -r libzn_poly.a $(LIBOBJS) \t$(RANLIB) libzn_poly.a # TODO: Put '-single_module -fPIC -dynamiclib' into $(SHARED_FLAG) # and use that; also support $(SO_EXTENSION)... libzn_poly.dylib: $(LIBOBJS) \t$(CC) $(LDFLAGS) -single_module -fPIC -dynamiclib -o libzn_poly.dylib $(LIBOBJS) $(LIBS) # Left for compatibility with previous versions of Sage's 'spkg-install': libzn_poly.dylib64: $(LIBOBJS) \t$(CC) -m64 -single_module -fPIC -dynamiclib -o libzn_poly.dylib $(LIBOBJS) $(LIBS) cygzn_poly.dll: $(LIBOBJS) \t$(CC) $(SHARED_FLAG) $(LDFLAGS) -Wl,--out-implib,libzn_poly-$(ZN_POLY_VERSION).dll.a -o cygzn_poly.dll $(LIBOBJS) $(LIBS) libzn_poly-$(ZN_POLY_VERSION).dll.a: cygzn_poly.dll libzn_poly.dll.a: libzn_poly-$(ZN_POLY_VERSION).dll.a \tln -sf libzn_poly-$(ZN_POLY_VERSION).dll.a libzn_poly.dll.a \tln -sf libzn_poly-$(ZN_POLY_VERSION).dll.a libzn_poly-$(ZN_POLY_ABI_VERSION).dll.a libzn_poly.so: libzn_poly-$(ZN_POLY_VERSION).so \tln -sf libzn_poly-$(ZN_POLY_VERSION).so libzn_poly.so \tln -sf libzn_poly-$(ZN_POLY_VERSION).so libzn_poly-$(ZN_POLY_ABI_VERSION).so libzn_poly-$(ZN_POLY_VERSION).so: $(LIBOBJS) \t$(CC) $(SHARED_FLAG) $(LDFLAGS) -Wl,-soname,libzn_poly-$(ZN_POLY_ABI_VERSION).so -o libzn_poly-$(ZN_POLY_VERSION).so $(LIBOBJS) $(LIBS) ##### test program test/test: $(TESTOBJS) $(HEADERS) \t$(CC) -g $(LDFLAGS) -o test/test $(TESTOBJS) $(LIBS) ##### profiling programs """) for x in prof_progs: print( """{0}-main.o: {0}-main.c $(HEADERS) \t$(CC) $(CFLAGS) $(CPPFLAGS) $(INCLUDES) -DNDEBUG -o {0}-main.o -c {0}-main.c {0}: {0}-main.o $(PROFOBJS) \t$(CC) $(CFLAGS) $(LDFLAGS) -o {0} {0}-main.o $(PROFOBJS) $(LIBS) {0}-main-ntl.o: {0}-main.c $(HEADERS) \t$(CC) $(CFLAGS) $(CPPFLAGS) $(INCLUDES) -DPROFILE_NTL -DNDEBUG -o {0}-main-ntl.o -c {0}-main.c {0}-ntl: {0}-main-ntl.o $(CPP_PROFOBJS) \t$(CXX) $(CXXFLAGS) $(LDFLAGS) -o {0}-ntl {0}-main-ntl.o $(CPP_PROFOBJS) $(CPP_LIBS) """.format(x)) print( """ ##### tuning utility tune/tune: $(TUNEOBJS) \t$(CC) $(CFLAGS) $(LDFLAGS) -o tune/tune $(TUNEOBJS) $(LIBS) ##### demo programs """) for x in demo_progs: print( """{0}: {0}.o $(LIBOBJS) \t$(CC) $(CFLAGS) $(LDFLAGS) -o {0} {0}.o $(LIBOBJS) $(LIBS) """.format(x)) print("\n##### object files (with debug code)\n") for x in lib_modules + test_modules + testprof_modules + demo_progs: print( """{0}-DEBUG.o: {0}.c $(HEADERS) \t$(CC) -g $(CFLAGS) $(CPPFLAGS) $(INCLUDES) -DDEBUG -o {0}-DEBUG.o -c {0}.c """.format(x)) print("\n##### object files (no debug code)\n") for x in (lib_modules + prof_modules + testprof_modules + tune_modules + demo_progs): print( """{0}.o: {0}.c $(HEADERS) \t$(CC) $(CFLAGS) $(CPPFLAGS) $(INCLUDES) -DNDEBUG -o {0}.o -c {0}.c """.format(x)) print("\n##### object files (C++, no debug code)\n") for x in cpp_prof_modules: print( """{0}.o: {0}.c $(HEADERS) \t$(CXX) $(CXXFLAGS) $(CPPFLAGS) $(CPP_INCLUDES) -DNDEBUG -o {0}.o -c {0}.c """.format(x)) zn_poly-0.9.2/profile/000077500000000000000000000000001360464557000146665ustar00rootroot00000000000000zn_poly-0.9.2/profile/array-profile-main.c000066400000000000000000000064121360464557000205330ustar00rootroot00000000000000/* array-profile-main.c: program for profiling simple array operations Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" #include "zn_poly.h" char* type_str[3] = {"add", "sub", "bfly"}; char* speed_str[2] = {"safe", "slim"}; void prof_main (int argc, char* argv[]) { ulong type, speed; double result, spread; printf ("\n"); // profile various butterfly loops for (type = 0; type < 3; type++) for (speed = 0; speed < 2; speed++) { ulong arg[2]; arg[0] = type; arg[1] = speed; result = profile (&spread, NULL, profile_bfly, arg, 1.0) / 1000; printf (" %4s %s, cycles/coeff = %6.2lf (%.1lf%%)\n", type_str[type], speed_str[speed], result, 100 * spread); } // profile mpn_add_n and mpn_sub_n for (type = 0; type < 2; type++) { ulong arg[1]; arg[0] = type; result = profile (&spread, NULL, profile_mpn_aors, arg, 1.0) / 1000; printf (" mpn_%s_n, cycles/limb = %6.2lf (%.1lf%%)\n", type_str[type], result, 100 * spread); } // profile zn_array_scalar_mul { ulong arg[2]; arg[1] = 0; arg[0] = ULONG_BITS - 1; result = profile (&spread, NULL, profile_scalar_mul, arg, 1.0) / 1000; printf ("scalar_mul (> half-word), cycles/coeff = %6.2lf (%.1lf%%)\n", result, 100 * spread); arg[0] = ULONG_BITS/2 - 1; result = profile (&spread, NULL, profile_scalar_mul, arg, 1.0) / 1000; printf ("scalar_mul (< half-word), cycles/coeff = %6.2lf (%.1lf%%)\n", result, 100 * spread); } // profile zn_array_scalar_mul with REDC { ulong arg[2]; arg[1] = 1; arg[0] = ULONG_BITS; result = profile (&spread, NULL, profile_scalar_mul, arg, 1.0) / 1000; printf ("scalar_mul (> half-word, non-slim, REDC), " "cycles/coeff = %6.2lf (%.1lf%%)\n", result, 100 * spread); arg[0] = ULONG_BITS - 1; result = profile (&spread, NULL, profile_scalar_mul, arg, 1.0) / 1000; printf ("scalar_mul (> half-word, REDC), " "cycles/coeff = %6.2lf (%.1lf%%)\n", result, 100 * spread); arg[0] = ULONG_BITS/2 - 1; result = profile (&spread, NULL, profile_scalar_mul, arg, 1.0) / 1000; printf ("scalar_mul (< half-word, REDC), " "cycles/coeff = %6.2lf (%.1lf%%)\n", result, 100 * spread); } } // end of file **************************************************************** zn_poly-0.9.2/profile/array-profile.c000066400000000000000000000112601360464557000176060ustar00rootroot00000000000000/* array-profile.c: routines for profiling simple array operations Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" #include "zn_poly.h" typedef void (*bfly_func)(ulong*, ulong*, ulong, const zn_mod_t); /* Profiles one of the butterfly routines. arg points to an array of ulongs: * First is 0 for add, 1 for subtract, 2 for inplace butterfly. * Second is 0 for safe version, 1 for slim version. Returns total cycle count for _count_ calls to butterfly of length 1000. */ double profile_bfly (void* arg, unsigned long count) { ulong type = ((ulong*) arg)[0]; ulong speed = ((ulong*) arg)[1]; ulong m = 123 + (1UL << (ULONG_BITS - (speed ? 2 : 1))); zn_mod_t mod; zn_mod_init (mod, m); const ulong n = 1000; ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n); ulong* buf2 = (ulong*) malloc (sizeof (ulong) * n); // generate random inputs size_t i; for (i = 0; i < n; i++) buf1[i] = random_ulong (m); for (i = 0; i < n; i++) buf2[i] = random_ulong (m); bfly_func target; if (type == 0) target = (bfly_func) zn_array_add_inplace; else if (type == 1) target = (bfly_func) zn_array_sub_inplace; else // type == 2 target = zn_array_bfly_inplace; // warm up ulong j; for (j = 0; j < count; j++) target (buf1, buf2, n, mod); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) target(buf1, buf2, n, mod); cycle_count_t t1 = get_cycle_counter (); free (buf2); free (buf1); zn_mod_clear (mod); return cycle_diff (t0, t1); } /* Profiles mpn_add_n or mpn_sub_n. arg points to a single ulong: 0 for mpn_add_n, 1 for mpn_sub_n. Returns total cycle count for _count_ calls to length 1000 call. */ double profile_mpn_aors (void* arg, unsigned long count) { ulong type = ((ulong*) arg)[0]; const ulong n = 1000; mp_limb_t* buf1 = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n); mp_limb_t* buf2 = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n); mp_limb_t (*target)(mp_limb_t*, const mp_limb_t*, const mp_limb_t*, mp_size_t n); target = type ? mpn_sub_n : mpn_add_n; // generate random inputs size_t i; for (i = 0; i < n; i++) buf1[i] = random_ulong (1UL << (ULONG_BITS - 1)); for (i = 0; i < n; i++) buf2[i] = random_ulong (1UL << (ULONG_BITS - 1)); // warm up ulong j; for (j = 0; j < count; j++) target (buf1, buf1, buf2, n); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) target (buf1, buf1, buf2, n); cycle_count_t t1 = get_cycle_counter (); free (buf2); free (buf1); return cycle_diff (t0, t1); } /* Profiles scalar multiplication. arg points to an array of ulongs: * First is modulus size in bits. * Second is 0 for regular multiply, 1 for REDC multiply Returns total cycle count for _count_ calls to zn_array_scalar_mul of length 1000. */ double profile_scalar_mul (void* arg, unsigned long count) { int bits = ((ulong*) arg)[0]; int algo = ((ulong*) arg)[1]; zn_mod_t mod; ulong m = random_modulus (bits, 1); zn_mod_init (mod, m); ulong x = random_ulong (m); const ulong n = 1000; // generate random input ulong* buf = (ulong*) malloc (sizeof (ulong) * n); size_t i; for (i = 0; i < n; i++) buf[i] = random_ulong (m); cycle_count_t t0, t1; // warm up ulong j; for (j = 0; j < count; j++) _zn_array_scalar_mul (buf, buf, n, x, algo, mod); // do the actual profile t0 = get_cycle_counter (); for (j = 0; j < count; j++) _zn_array_scalar_mul (buf, buf, n, x, algo, mod); t1 = get_cycle_counter (); free (buf); zn_mod_clear (mod); return cycle_diff (t0, t1); } // end of file **************************************************************** zn_poly-0.9.2/profile/invert-profile-main.c000066400000000000000000000115251360464557000207250ustar00rootroot00000000000000/* invert-profile-main.c: program for profiling power series inversion (zn_poly algorithms, and optionally NTL) Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ /* Note: if this file is compiled with the constant PROFILE_NTL defined, then it will include support for NTL profiling, and needs to be compiled as C++. Otherwise it can be compiled as a C program. */ #include #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" #include "zn_poly.h" /* Performs and prints one line of profiling output, for the given number of bits and polynomial length. active[i] is a flag indicating whether the algorithm indexed by i should be profiled (see ALGO_INVERT_xyz constants). */ void do_line (int* active, unsigned b, size_t n) { char* names[] = {"best", "ntl"}; profile_info_t info; info->n = n; // choose an odd modulus exactly b bits long info->m = (1UL << (b - 1)) + 2 * random_ulong (1UL << (b - 2)) + 1; printf ("len = %5lu, bits = %2u", n, b); fflush (stdout); int algo; for (algo = 0; algo < 2; algo++) { if (active[algo]) { info->algo = algo; double spread, result; result = profile (&spread, NULL, profile_invert, &info, 1.0); printf (", %s = %.3le (%.1lf%%)", names[algo], result, 100 * spread); fflush (stdout); } } printf ("\n"); } #if __cplusplus extern "C" #endif void prof_main (int argc, char* argv[]) { // read command line arguments // can include the strings "best", "ntl" // to select various algorithms // if you do "bits " then only that number of bits will be profiled // otherwise it ranges over various bitsizes // if you do "length " then only that length will be profiled // otherwise it ranges over various lengths int active[2] = {0, 0}; int any_active = 0; int do_one_bits = 0; int chosen_bits = 0; int do_one_length = 0; ulong chosen_length = 0; int i; for (i = 1; i < argc; i++) { if (!strcmp (argv[i], "best")) active[ALGO_INVERT_BEST] = any_active = 1; else if (!strcmp (argv[i], "ntl")) active[ALGO_INVERT_NTL] = any_active = 1; else if (!strcmp (argv[i], "bits")) { do_one_bits = 1; chosen_bits = atoi (argv[++i]); } else if (!strcmp (argv[i], "length")) { do_one_length = 1; chosen_length = atol (argv[++i]); } else { printf ("unknown option %s\n", argv[i]); exit (1); } } if (!any_active) active[0] = 1; // profile plain multiplication if nothing selected // bitsizes to use by default if none are selected unsigned bitsizes[9] = {4, 8, 16, 24, 32, 40, 48, 56, 64}; int j; size_t len; if (do_one_bits && do_one_length) { do_line (active, chosen_bits, chosen_length); } else if (do_one_bits && !do_one_length) { // loop over lengths, spaced out logarithmically for (j = 0; j < 120; j++) { size_t new_len = (size_t) floor (pow (1.1, (double) j)); if (new_len == len) continue; len = new_len; do_line (active, chosen_bits, len); } } else if (!do_one_bits && do_one_length) { // loop over bitsizes in above table for (i = 0; i < sizeof (bitsizes) / sizeof (bitsizes[0]); i++) do_line (active, bitsizes[i], chosen_length); } else // neither bits nor length is fixed { // loop over bitsizes in above table for (i = 0; i < sizeof (bitsizes) / sizeof (bitsizes[0]); i++) { // loop over lengths, spaced out logarithmically for (j = 0; j < 120; j++) { size_t new_len = (size_t) floor (pow (1.1, (double) j)); if (new_len == len) continue; len = new_len; do_line (active, bitsizes[i], len); } printf("-------------------------------------------\n"); } } } // end of file **************************************************************** zn_poly-0.9.2/profile/invert-profile.c000066400000000000000000000036041360464557000200020ustar00rootroot00000000000000/* invert-profile.c: routines for profiling power series inversion Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" double profile_invert (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; if (info->algo == ALGO_INVERT_NTL) return profile_invert_ntl (arg, count); size_t n = info->n; zn_mod_t mod; zn_mod_init (mod, info->m); ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n); ulong* buf2 = (ulong*) malloc (sizeof (ulong) * n); // generate random inputs size_t i; for (i = 0; i < n; i++) buf1[i] = random_ulong (info->m); buf1[0] = 1; // warm up ulong j; for (j = 0; j < count; j++) zn_array_invert (buf2, buf1, n, mod); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) zn_array_invert (buf2, buf1, n, mod); cycle_count_t t1 = get_cycle_counter (); free (buf2); free (buf1); zn_mod_clear (mod); return cycle_diff (t0, t1); } // end of file **************************************************************** zn_poly-0.9.2/profile/mpn_mulmid-profile-main.c000066400000000000000000000057671360464557000215720ustar00rootroot00000000000000/* mpn_mulmid-profile-main.c: program for profiling mpn middle products Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" #include "zn_poly.h" /* Performs and prints one line of profiling output, for the given integer length */ void do_line (size_t n) { profile_info_t info; printf ("n = %5lu", n); fflush (stdout); double spread, result; info->n1 = 2 * n - 1; info->n2 = n; result = profile (&spread, NULL, profile_mpn_smp_basecase, &info, 1.0); printf (", %.3le (%.1lf%%)", result, 100 * spread); if (n >= 2) { info->n = n; result = profile (&spread, NULL, profile_mpn_smp_kara, &info, 1.0); printf (", %.3le (%.1lf%%)", result, 100 * spread); } else printf (", N/A "); result = profile (&spread, NULL, profile_mpn_smp, &info, 1.0); printf (", %.3le (%.1lf%%)", result, 100 * spread); info->n1 = info->n2 = n; result = profile (&spread, NULL, profile_mpn_mul, &info, 1.0); printf (", %.3le (%.1lf%%)", result, 100 * spread); info->n1 = 2 * n - 1; info->n2 = n; result = profile (&spread, NULL, profile_mpn_mulmid_fallback, &info, 1.0); printf (", %.3le (%.1lf%%)", result, 100 * spread); printf ("\n"); } #if __cplusplus extern "C" #endif void prof_main (int argc, char* argv[]) { // read command line arguments // if you do "length " then only that length will be profiled // otherwise it ranges over various lengths int do_one_length = 0; ulong chosen_length = 0; int i; for (i = 1; i < argc; i++) { if (!strcmp (argv[i], "length")) { do_one_length = 1; chosen_length = atol (argv[++i]); } else { printf ("unknown option %s\n", argv[i]); exit (1); } } int j; size_t n; printf ("fields: smp_basecase, smp_kara, " "smp, mpn_mul, mulmid_fallback\n"); if (do_one_length) { do_line (chosen_length); } else { // loop over lengths, spaced out logarithmically for (n = 1; n <= 100; n++) do_line (n); } } // end of file **************************************************************** zn_poly-0.9.2/profile/mpn_mulmid-profile.c000066400000000000000000000117301360464557000206330ustar00rootroot00000000000000/* mpn_mulmid-profile.c: routines for profiling mpn middle products Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" #include "zn_poly.h" double profile_mpn_mul (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; size_t n1 = info->n1; size_t n2 = info->n2; mp_ptr buf1 = malloc (sizeof (mp_limb_t) * n1); mp_ptr buf2 = malloc (sizeof (mp_limb_t) * n2); mp_ptr buf3 = malloc (sizeof (mp_limb_t) * (n1 + n2)); // generate random inputs mpn_random (buf1, n1); mpn_random (buf2, n2); // warm up ulong j; for (j = 0; j < count; j++) ZNP_mpn_mul (buf3, buf1, n1, buf2, n2); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) ZNP_mpn_mul (buf3, buf1, n1, buf2, n2); cycle_count_t t1 = get_cycle_counter (); free (buf3); free (buf2); free (buf1); return cycle_diff (t0, t1); } double profile_mpn_mulmid_fallback (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; size_t n1 = info->n1; size_t n2 = info->n2; mp_ptr buf1 = malloc (sizeof (mp_limb_t) * n1); mp_ptr buf2 = malloc (sizeof (mp_limb_t) * n2); mp_ptr buf3 = malloc (sizeof (mp_limb_t) * (n1 - n2 + 3)); // generate random inputs mpn_random (buf1, n1); mpn_random (buf2, n2); // warm up ulong j; for (j = 0; j < count; j++) ZNP_mpn_mulmid_fallback (buf3, buf1, n1, buf2, n2); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) ZNP_mpn_mulmid_fallback (buf3, buf1, n1, buf2, n2); cycle_count_t t1 = get_cycle_counter (); free (buf3); free (buf2); free (buf1); return cycle_diff (t0, t1); } double profile_mpn_smp (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; size_t n1 = info->n1; size_t n2 = info->n2; mp_ptr buf1 = malloc (sizeof (mp_limb_t) * n1); mp_ptr buf2 = malloc (sizeof (mp_limb_t) * n2); mp_ptr buf3 = malloc (sizeof (mp_limb_t) * (n1 - n2 + 3)); // generate random inputs mpn_random (buf1, n1); mpn_random (buf2, n2); // warm up ulong j; for (j = 0; j < count; j++) ZNP_mpn_smp (buf3, buf1, n1, buf2, n2); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) ZNP_mpn_smp (buf3, buf1, n1, buf2, n2); cycle_count_t t1 = get_cycle_counter (); free (buf3); free (buf2); free (buf1); return cycle_diff (t0, t1); } double profile_mpn_smp_basecase (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; size_t n1 = info->n1; size_t n2 = info->n2; mp_ptr buf1 = malloc (sizeof (mp_limb_t) * n1); mp_ptr buf2 = malloc (sizeof (mp_limb_t) * n2); mp_ptr buf3 = malloc (sizeof (mp_limb_t) * (n1 - n2 + 3)); // generate random inputs mpn_random (buf1, n1); mpn_random (buf2, n2); // warm up ulong j; for (j = 0; j < count; j++) ZNP_mpn_smp_basecase (buf3, buf1, n1, buf2, n2); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) ZNP_mpn_smp_basecase (buf3, buf1, n1, buf2, n2); cycle_count_t t1 = get_cycle_counter (); free (buf3); free (buf2); free (buf1); return cycle_diff (t0, t1); } double profile_mpn_smp_kara (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; size_t n = info->n; mp_ptr buf1 = malloc (sizeof (mp_limb_t) * (2 * n - 1)); mp_ptr buf2 = malloc (sizeof (mp_limb_t) * n); mp_ptr buf3 = malloc (sizeof (mp_limb_t) * (n + 2)); // generate random inputs mpn_random (buf1, 2 * n - 1); mpn_random (buf2, n); // warm up ulong j; for (j = 0; j < count; j++) ZNP_mpn_smp_kara (buf3, buf1, buf2, n); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) ZNP_mpn_smp_kara (buf3, buf1, buf2, n); cycle_count_t t1 = get_cycle_counter (); free (buf3); free (buf2); free (buf1); return cycle_diff (t0, t1); } // end of file **************************************************************** zn_poly-0.9.2/profile/mul-profile-main.c000066400000000000000000000141641360464557000202150ustar00rootroot00000000000000/* mul-profile-main.c: program for profiling multiplication (various zn_poly algorithms, and optionally NTL) Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ /* Note: if this file is compiled with the constant PROFILE_NTL defined, then it will include support for NTL profiling, and needs to be compiled as C++. Otherwise it can be compiled as a C program. */ #include #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" #include "zn_poly.h" /* Performs and prints one line of profiling output, for the given number of bits and polynomial length. active[i] is a flag indicating whether the algorithm indexed by i should be profiled (see ALGO_MUL_xyz constants). */ void do_line (int* active, unsigned b, size_t n, int sqr) { char* names[] = {"best", "ks1", "ks1_redc", "ks2", "ks2_redc", "ks3", "ks3_redc", "ks4", "ks4_redc", "fft", "ntl"}; profile_info_t info; info->n1 = info->n2 = n; // choose an odd modulus exactly b bits long info->m = (1UL << (b - 1)) + 2 * random_ulong (1UL << (b - 2)) + 1; info->sqr = sqr; printf ("len = %5lu, bits = %2u", n, b); fflush (stdout); int algo; for (algo = 0; algo < 11; algo++) { if (active[algo]) { info->algo = algo; double spread; double result = profile (&spread, NULL, profile_mul, &info, 1.0); printf (", %s = %.3le (%.1lf%%)", names[algo], result, 100 * spread); fflush (stdout); } } printf ("\n"); } #if __cplusplus extern "C" #endif void prof_main (int argc, char* argv[]) { // read command line arguments // can include the strings "best", "ks1", "ks1_redc", "ks2", "ks2_redc", // "ks3", "ks3_redc", "ks4", "ks4_redc", "fft", "ntl" // to select various algorithms // can also include "sqr" anywhere, which means to profile squaring // if you do "bits " then only that number of bits will be profiled // otherwise it ranges over various bitsizes // if you do "length " then only that length will be profiled // otherwise it ranges over various lengths int active[11] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; int any_active = 0; int sqr = 0; int do_one_bits = 0; int chosen_bits = 0; int do_one_length = 0; ulong chosen_length = 0; int i; for (i = 1; i < argc; i++) { if (!strcmp (argv[i], "best")) active[ALGO_MUL_BEST] = any_active = 1; else if (!strcmp (argv[i], "ks1")) active[ALGO_MUL_KS1] = any_active = 1; else if (!strcmp (argv[i], "ks1_redc")) active[ALGO_MUL_KS1_REDC] = any_active = 1; else if (!strcmp (argv[i], "ks2")) active[ALGO_MUL_KS2] = any_active = 1; else if (!strcmp (argv[i], "ks2_redc")) active[ALGO_MUL_KS2_REDC] = any_active = 1; else if (!strcmp (argv[i], "ks3")) active[ALGO_MUL_KS3] = any_active = 1; else if (!strcmp (argv[i], "ks3_redc")) active[ALGO_MUL_KS3_REDC] = any_active = 1; else if (!strcmp (argv[i], "ks4")) active[ALGO_MUL_KS4] = any_active = 1; else if (!strcmp (argv[i], "ks4_redc")) active[ALGO_MUL_KS4_REDC] = any_active = 1; else if (!strcmp (argv[i], "fft")) active[ALGO_MUL_FFT] = any_active = 1; else if (!strcmp (argv[i], "ntl")) active[ALGO_MUL_NTL] = any_active = 1; else if (!strcmp (argv[i], "bits")) { do_one_bits = 1; chosen_bits = atoi (argv[++i]); } else if (!strcmp (argv[i], "length")) { do_one_length = 1; chosen_length = atol (argv[++i]); } else if (!strcmp (argv[i], "sqr")) { sqr = 1; } else { printf ("unknown option %s\n", argv[i]); exit (1); } } if (!any_active) active[0] = 1; // profile plain multiplication if nothing selected // bitsizes to use by default if none are selected unsigned bitsizes[9] = {4, 8, 16, 24, 32, 40, 48, 56, 64}; int j; size_t len; if (do_one_bits && do_one_length) { do_line (active, chosen_bits, chosen_length, sqr); } else if (do_one_bits && !do_one_length) { // loop over lengths, spaced out logarithmically for (j = 0; j < 120; j++) { size_t new_len = (size_t) floor (pow (1.1, (double) j)); if (new_len == len) continue; len = new_len; do_line (active, chosen_bits, len, sqr); } } else if (!do_one_bits && do_one_length) { // loop over bitsizes in above table for (i = 0; i < sizeof (bitsizes) / sizeof (bitsizes[0]); i++) do_line (active, bitsizes[i], chosen_length, sqr); } else // neither bits nor length is fixed { // loop over bitsizes in above table for (i = 0; i < sizeof (bitsizes) / sizeof (bitsizes[0]); i++) { // loop over lengths, spaced out logarithmically for (j = 0; j < 120; j++) { size_t new_len = (size_t) floor (pow (1.1, (double) j)); if (new_len == len) continue; len = new_len; do_line (active, bitsizes[i], len, sqr); } printf("-------------------------------------------\n"); } } } // end of file **************************************************************** zn_poly-0.9.2/profile/mul-profile.c000066400000000000000000000075341360464557000172760ustar00rootroot00000000000000/* mul-profile.c: routines for profiling multiplication (various zn_poly algorithms) Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" /* Wrapper functions to make zn_array_mul and zn_array_mul_fft look as if they have a redc flag. */ void zn_array_mul_wrapper (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { zn_array_mul (res, op1, n1, op2, n2, mod); } void zn_array_mul_fft_wrapper (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { // call the FFT code with the correct scaling factor int sqr = (op1 == op2) && (n1 == n2); ulong x = zn_array_mul_fft_fudge (n1, n2, sqr, mod); zn_array_mul_fft (res, op1, n1, op2, n2, x, mod); } double profile_mul (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; if (info->algo == ALGO_MUL_NTL) return profile_mul_ntl (arg, count); size_t n1 = info->n1; size_t n2 = info->n2; zn_mod_t mod; zn_mod_init (mod, info->m); ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n1); ulong* buf2 = info->sqr ? buf1 : ((ulong*) malloc (sizeof (ulong) * n2)); ulong* buf3 = (ulong*) malloc (sizeof (ulong) * (n1 + n2 - 1)); // generate random inputs size_t i; for (i = 0; i < n1; i++) buf1[i] = random_ulong (info->m); for (i = 0; i < n2; i++) buf2[i] = random_ulong (info->m); void (*target)(ulong*, const ulong*, size_t, const ulong*, size_t, int, const zn_mod_t); int redc; switch (info->algo) { case ALGO_MUL_BEST: target = zn_array_mul_wrapper; break; case ALGO_MUL_KS1: target = zn_array_mul_KS1; redc = 0; break; case ALGO_MUL_KS1_REDC: target = zn_array_mul_KS1; redc = 1; break; case ALGO_MUL_KS2: target = zn_array_mul_KS2; redc = 0; break; case ALGO_MUL_KS2_REDC: target = zn_array_mul_KS2; redc = 1; break; case ALGO_MUL_KS3: target = zn_array_mul_KS3; redc = 0; break; case ALGO_MUL_KS3_REDC: target = zn_array_mul_KS3; redc = 1; break; case ALGO_MUL_KS4: target = zn_array_mul_KS4; redc = 0; break; case ALGO_MUL_KS4_REDC: target = zn_array_mul_KS4; redc = 1; break; case ALGO_MUL_FFT: target = zn_array_mul_fft_wrapper; redc = 0; break; default: abort (); } // warm up ulong j; for (j = 0; j < count / 4; j++) target (buf3, buf1, n1, buf2, n2, redc, mod); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) target (buf3, buf1, n1, buf2, n2, redc, mod); cycle_count_t t1 = get_cycle_counter (); free (buf3); if (!info->sqr) free (buf2); free (buf1); zn_mod_clear (mod); return cycle_diff (t0, t1); } // end of file **************************************************************** zn_poly-0.9.2/profile/mulmid-profile-main.c000066400000000000000000000133231360464557000207030ustar00rootroot00000000000000/* mulmid-profile-main.c: program for profiling middle products Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" #include "zn_poly.h" /* Performs and prints one line of profiling output, for the given number of bits and polynomial length (it does a (2*n) x n middle product). active[i] is a flag indicating whether the algorithm indexed by i should be profiled (see ALGO_MULMID_xyz constants). */ void do_line (int* active, unsigned b, size_t n) { char* names[] = {"best", "fallback", "ks1", "ks1_redc", "ks2", "ks2_redc", "ks3", "ks3_redc", "ks4", "ks4_redc", "fft"}; profile_info_t info; info->n1 = 2 * n; info->n2 = n; // choose an odd modulus exactly b bits long info->m = (1UL << (b - 1)) + 2 * random_ulong (1UL << (b - 2)) + 1; printf ("len = %5lu, bits = %2u", n, b); fflush (stdout); int algo; for (algo = 0; algo < 11; algo++) { if (active[algo]) { info->algo = algo; double result, spread; result = profile (&spread, NULL, profile_mulmid, &info, 1.0); printf (", %s = %.3le (%.1lf%%)", names[algo], result, 100 * spread); fflush (stdout); } } printf ("\n"); } void prof_main (int argc, char* argv[]) { // read command line arguments // can include the strings "best", "ks1", "ks1_redc", "ks2", "ks2_redc", // "ks3", "ks3_redc", "ks4", "ks4_redc", // "fallback", "fft" // to select various algorithms // if you do "bits " then only that number of bits will be profiled // otherwise it ranges over various bitsizes // if you do "length " then only that length will be profiled // otherwise it ranges over various lengths int active[11] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; int any_active = 0; int do_one_bits = 0; int chosen_bits = 0; int do_one_length = 0; ulong chosen_length = 0; int i; for (i = 1; i < argc; i++) { if (!strcmp (argv[i], "best")) active[ALGO_MULMID_BEST] = any_active = 1; else if (!strcmp (argv[i], "fallback")) active[ALGO_MULMID_FALLBACK] = any_active = 1; else if (!strcmp (argv[i], "ks1")) active[ALGO_MULMID_KS1] = any_active = 1; else if (!strcmp (argv[i], "ks1_redc")) active[ALGO_MULMID_KS1_REDC] = any_active = 1; else if (!strcmp (argv[i], "ks2")) active[ALGO_MULMID_KS2] = any_active = 1; else if (!strcmp (argv[i], "ks2_redc")) active[ALGO_MULMID_KS2_REDC] = any_active = 1; else if (!strcmp (argv[i], "ks3")) active[ALGO_MULMID_KS3] = any_active = 1; else if (!strcmp (argv[i], "ks3_redc")) active[ALGO_MULMID_KS3_REDC] = any_active = 1; else if (!strcmp (argv[i], "ks4")) active[ALGO_MULMID_KS4] = any_active = 1; else if (!strcmp (argv[i], "ks4_redc")) active[ALGO_MULMID_KS4_REDC] = any_active = 1; else if (!strcmp (argv[i], "fft")) active[ALGO_MULMID_FFT] = any_active = 1; else if (!strcmp (argv[i], "bits")) { do_one_bits = 1; chosen_bits = atoi (argv[++i]); } else if (!strcmp (argv[i], "length")) { do_one_length = 1; chosen_length = atol (argv[++i]); } else { printf ("unknown option %s\n", argv[i]); exit (1); } } if (!any_active) active[0] = 1; // profile plain multiplication if nothing selected // bitsizes to use by default if none are selected unsigned bitsizes[9] = {4, 8, 16, 24, 32, 40, 48, 56, 64}; int j; size_t len; if (do_one_bits && do_one_length) { do_line (active, chosen_bits, chosen_length); } else if (do_one_bits && !do_one_length) { // loop over lengths, spaced out logarithmically for (j = 0; j < 120; j++) { size_t new_len = (size_t) floor (pow (1.1, (double) j)); if (new_len == len) continue; len = new_len; do_line (active, chosen_bits, len); } } else if (!do_one_bits && do_one_length) { // loop over bitsizes in above table for (i = 0; i < sizeof (bitsizes) / sizeof (bitsizes[0]); i++) do_line (active, bitsizes[i], chosen_length); } else // neither bits nor length is fixed { // loop over bitsizes in above table for (i = 0; i < sizeof (bitsizes) / sizeof (bitsizes[0]); i++) { // loop over lengths, spaced out logarithmically for (j = 0; j < 120; j++) { size_t new_len = (size_t) floor (pow (1.1, (double) j)); if (new_len == len) continue; len = new_len; do_line (active, bitsizes[i], len); } printf("-------------------------------------------\n"); } } } // end of file **************************************************************** zn_poly-0.9.2/profile/mulmid-profile.c000066400000000000000000000106161360464557000177630ustar00rootroot00000000000000/* mulmid-profile.c: routines for profiling middle products Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" /* Wrapper functions to make zn_array_mulmid, zn_array_mulmid_fallback and zn_array_mulmid_fft look as if they have a redc flag. */ void zn_array_mulmid_wrapper (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { zn_array_mulmid (res, op1, n1, op2, n2, mod); } void zn_array_mulmid_fallback_wrapper (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { zn_array_mulmid_fallback (res, op1, n1, op2, n2, 0, mod); } void zn_array_mulmid_fft_wrapper (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { // call the FFT code with the correct scaling factor ulong x = zn_array_mulmid_fft_fudge (n1, n2, mod); zn_array_mulmid_fft (res, op1, n1, op2, n2, x, mod); } double profile_mulmid (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; size_t n1 = info->n1; size_t n2 = info->n2; zn_mod_t mod; zn_mod_init (mod, info->m); ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n1); ulong* buf2 = (ulong*) malloc (sizeof (ulong) * n2); ulong* buf3 = (ulong*) malloc (sizeof (ulong) * (n1 - n2 + 1)); // generate random inputs size_t i; for (i = 0; i < n1; i++) buf1[i] = random_ulong (info->m); for (i = 0; i < n2; i++) buf2[i] = random_ulong (info->m); void (*target)(ulong*, const ulong*, size_t, const ulong*, size_t, int, const zn_mod_t); int redc; switch (info->algo) { case ALGO_MULMID_BEST: target = zn_array_mulmid_wrapper; break; case ALGO_MULMID_FALLBACK: target = zn_array_mulmid_fallback_wrapper; break; case ALGO_MULMID_KS1: target = zn_array_mulmid_KS1; redc = 0; break; case ALGO_MULMID_KS1_REDC: target = zn_array_mulmid_KS1; redc = 1; break; case ALGO_MULMID_KS2: target = zn_array_mulmid_KS2; redc = 0; break; case ALGO_MULMID_KS2_REDC: target = zn_array_mulmid_KS2; redc = 1; break; case ALGO_MULMID_KS3: target = zn_array_mulmid_KS3; redc = 0; break; case ALGO_MULMID_KS3_REDC: target = zn_array_mulmid_KS3; redc = 1; break; case ALGO_MULMID_KS4: target = zn_array_mulmid_KS4; redc = 0; break; case ALGO_MULMID_KS4_REDC: target = zn_array_mulmid_KS4; redc = 1; break; case ALGO_MULMID_FFT: target = zn_array_mulmid_fft_wrapper; break; default: abort (); } // warm up ulong j; for (j = 0; j < count/4; j++) target (buf3, buf1, n1, buf2, n2, redc, mod); // do the actual profile cycle_count_t t0 = get_cycle_counter (); for (j = 0; j < count; j++) target (buf3, buf1, n1, buf2, n2, redc, mod); cycle_count_t t1 = get_cycle_counter (); free (buf3); free (buf2); free (buf1); zn_mod_clear (mod); return cycle_diff (t0, t1); } // end of file **************************************************************** zn_poly-0.9.2/profile/negamul-profile-main.c000066400000000000000000000075461360464557000210560ustar00rootroot00000000000000/* negamul-profile-main.c: program for profiling negacyclic multiplication Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" #include "zn_poly.h" /* Performs and prints one line of profiling output, for the given number of bits and polynomial length. */ void do_line (unsigned b, unsigned lgL, int sqr) { profile_info_t info; info->lgL = lgL; // choose an odd modulus exactly _bits_ bits long info->m = (1UL << (b - 1)) + 2 * random_ulong (1UL << (b - 2)) + 1; info->sqr = sqr; size_t n = 1UL << lgL; printf ("len = %6lu, bits = %2u", n, b); fflush (stdout); double result, spread; info->algo = ALGO_NEGAMUL_FALLBACK; result = profile (&spread, NULL, profile_negamul, &info, 1.0); printf (", fallback = %.3le (%.1lf%%)", result, 100 * spread); fflush (stdout); info->algo = ALGO_NEGAMUL_NUSS; result = profile (&spread, NULL, profile_negamul, &info, 1.0); printf (", nuss = %.3le (%.1lf%%)", result, 100 * spread); fflush (stdout); printf ("\n"); } void prof_main (int argc, char* argv[]) { // read command line arguments // if you do "bits " then only that number of bits will be profiled // otherwise it ranges over various bitsizes // if you do "lgL " then only that length will be profiled // otherwise it ranges over various lengths // can also include "sqr" anywhere, which means to profile squaring int do_one_bits = 0; int chosen_bits = 0; int sqr = 0; int do_one_length = 0; unsigned chosen_lgL = 0; int i; for (i = 1; i < argc; i++) { if (!strcmp (argv[i], "bits")) { do_one_bits = 1; chosen_bits = atoi (argv[++i]); } else if (!strcmp (argv[i], "lgL")) { do_one_length = 1; chosen_lgL = atol (argv[++i]); } else if (!strcmp (argv[i], "sqr")) { sqr = 1; } else { printf ("unknown option %s\n", argv[i]); exit (1); } } // bitsizes to use by default if none are selected unsigned bitsizes[9] = {4, 8, 16, 24, 32, 40, 48, 56, 64}; unsigned lgL; if (do_one_bits && do_one_length) { do_line (chosen_bits, chosen_lgL, sqr); } else if (do_one_bits && !do_one_length) { // loop over lengths for (lgL = 4; lgL <= 16; lgL++) do_line (chosen_bits, lgL, sqr); } else if (!do_one_bits && do_one_length) { // loop over bitsizes in above table for (i = 0; i < sizeof (bitsizes) / sizeof (bitsizes[0]); i++) do_line (bitsizes[i], chosen_lgL, sqr); } else // neither bits nor length is fixed { // loop over bitsizes in above table for (i = 0; i < sizeof (bitsizes) / sizeof (bitsizes[0]); i++) { // loop over lengths, spaced out logarithmically for (lgL = 4; lgL <= 16; lgL++) do_line (bitsizes[i], lgL, sqr); printf ("-------------------------------------------\n"); } } } // end of file **************************************************************** zn_poly-0.9.2/profile/negamul-profile.c000066400000000000000000000057141360464557000201270ustar00rootroot00000000000000/* negamul-profile.c: routines for profiling negacyclic multiplication and squaring Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "profiler.h" #include "zn_poly_internal.h" #include "zn_poly.h" double profile_negamul (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; ulong j; zn_mod_t mod; zn_mod_init (mod, info->m); size_t n = 1UL << info->lgL; ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n); ulong* buf2 = info->sqr ? buf1 : ((ulong*) malloc (sizeof (ulong) * n)); ulong* buf3 = (ulong*) malloc (sizeof (ulong) * n); cycle_count_t t0, t1; // generate random inputs size_t i; for (i = 0; i < n; i++) buf1[i] = random_ulong (info->m); for (i = 0; i < n; i++) buf2[i] = random_ulong (info->m); if (info->algo == ALGO_NEGAMUL_FALLBACK) { // KS version ulong* temp = (ulong*) malloc (sizeof (ulong) * 2 * n); // warm up for (j = 0; j < count/4; j++) { zn_array_mul (temp, buf1, n, buf2, n, mod); zn_array_sub (buf3, temp, temp + n, n, mod); } // do the actual profile t0 = get_cycle_counter (); for (j = 0; j < count; j++) { zn_array_mul (temp, buf1, n, buf2, n, mod); zn_array_sub (buf3, temp, temp + n, n, mod); } t1 = get_cycle_counter (); free(temp); } else if (info->algo == ALGO_NEGAMUL_NUSS) { // nussbaumer version pmfvec_t vec1, vec2; pmfvec_init_nuss (vec1, info->lgL, mod); pmfvec_init_nuss (vec2, info->lgL, mod); // warm up for (j = 0; j < count/4; j++) nuss_mul (buf3, buf1, buf2, vec1, vec2); // do the actual profile t0 = get_cycle_counter (); for (j = 0; j < count; j++) nuss_mul (buf3, buf1, buf2, vec1, vec2); t1 = get_cycle_counter (); pmfvec_clear (vec2); pmfvec_clear (vec1); } else abort (); free (buf3); if (!info->sqr) free (buf2); free (buf1); zn_mod_clear (mod); return cycle_diff (t0, t1); } // end of file **************************************************************** zn_poly-0.9.2/profile/ntl-profile-dummy.c000066400000000000000000000024371360464557000204240ustar00rootroot00000000000000/* ntl-profile-dummy.c: dummy routines replacing NTL profiling routines when no NTL support is compiled in Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include void no_ntl_support () { printf ("\n\n"); printf ("no NTL profiling support compiled in!\n"); abort (); } double profile_mul_ntl (void* arg, unsigned long count) { no_ntl_support (); } double profile_invert_ntl (void* arg, unsigned long count) { no_ntl_support (); } // end of file **************************************************************** zn_poly-0.9.2/profile/ntl-profile.c000066400000000000000000000101711360464557000172650ustar00rootroot00000000000000/* ntl-profile.c: routines for profiling NTL Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "profiler.h" #include #include extern "C" double profile_mul_ntl (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; cycle_count_t t0, t1; if (info->m < (ulong) NTL_SP_BOUND) { // zz_pX version NTL::zz_pX f1, f2, g; NTL::zz_p::init (info->m); size_t i; for (i = 0; i < info->n1; i++) SetCoeff (f1, i, random_ulong (info->m)); for (i = 0; i < info->n2; i++) SetCoeff (f2, i, random_ulong (info->m)); if (info->sqr) { // warm up ulong j; for (j = 0; j < count; j++) sqr (g, f1); t0 = get_cycle_counter (); for (j = 0; j < count; j++) sqr (g, f1); t1 = get_cycle_counter (); } else { // warm up ulong j; for (j = 0; j < count; j++) mul (g, f1, f2); t0 = get_cycle_counter (); for (j = 0; j < count; j++) mul (g, f1, f2); t1 = get_cycle_counter (); } } else { // ZZ_pX version NTL::ZZ_pX f1, f2, g; NTL::ZZ_p::init (NTL::to_ZZ (info->m)); size_t i; for (i = 0; i < info->n1; i++) SetCoeff (f1, i, random_ulong (info->m)); for (i = 0; i < info->n2; i++) SetCoeff (f2, i, random_ulong (info->m)); if (info->sqr) { // warm up ulong j; for (j = 0; j < count; j++) sqr (g, f1); t0 = get_cycle_counter (); for (j = 0; j < count; j++) sqr (g, f1); t1 = get_cycle_counter (); } else { // warm up ulong j; for (j = 0; j < count; j++) mul (g, f1, f2); t0 = get_cycle_counter (); for (j = 0; j < count; j++) mul (g, f1, f2); t1 = get_cycle_counter (); } } return cycle_diff (t0, t1); } extern "C" double profile_invert_ntl (void* arg, unsigned long count) { profile_info_struct* info = (profile_info_struct*) arg; cycle_count_t t0, t1; if (info->m < (ulong) NTL_SP_BOUND) { // zz_pX version NTL::zz_pX f1, f2, g; NTL::zz_p::init (info->m); size_t i; SetCoeff (f1, 0, 1); for (i = 1; i < info->n; i++) SetCoeff (f1, i, random_ulong (info->m)); // warm up ulong j; for (j = 0; j < count; j++) InvTrunc (g, f1, info->n); t0 = get_cycle_counter (); for (j = 0; j < count; j++) InvTrunc (g, f1, info->n); t1 = get_cycle_counter (); } else { // ZZ_pX version NTL::ZZ_pX f1, f2, g; NTL::ZZ_p::init (NTL::to_ZZ (info->m)); size_t i; SetCoeff (f1, 0, 1); for (i = 1; i < info->n; i++) SetCoeff (f1, i, random_ulong (info->m)); // warm up ulong j; for (j = 0; j < count; j++) InvTrunc (g, f1, info->n); t0 = get_cycle_counter (); for (j = 0; j < count; j++) InvTrunc (g, f1, info->n); t1 = get_cycle_counter (); } return cycle_diff (t0, t1); } // end of file **************************************************************** zn_poly-0.9.2/profile/prof_main.c000066400000000000000000000025161360464557000170100ustar00rootroot00000000000000/* prof_main.c: main() routine for profiling programs Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "profiler.h" /* Profiling programs link against this file, and implement a function void prof_main(argc, argv) */ int main (int argc, char* argv[]) { calibrate_cycle_scale_factor (); #if !ZNP_HAVE_CYCLE_COUNTER printf ("Cannot run profiles; no cycle counter on this system!\n"); #else gmp_randinit_default (randstate); prof_main (argc, argv); gmp_randclear (randstate); #endif return 0; } // end of file **************************************************************** zn_poly-0.9.2/profile/profiler.c000066400000000000000000000102661360464557000166610ustar00rootroot00000000000000/* profiler.c: some profiling routines Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ /* Include system headers *before* 'ulong' gets (re)defined: */ #include #include #include #include "profiler.h" /* Includes zn_poly.h, which defines 'ulong'. */ double cycle_scale_factor; /* This function eats some CPU cycles. The number of cycles eaten is roughly proportional to the count parameter. This function exists only to ensure that the compiler is not smart enough to optimise away our cycle-eating. */ void use_up_cycles (unsigned long count) { for (; count; count--) { unsigned long x[3] = {0, 1, 2}; unsigned long y[3] = {0, 1, 2}; unsigned long z[5]; zn_mod_t mod; zn_mod_init (mod, 3); zn_array_mul (z, x, 3, y, 3, mod); zn_mod_clear (mod); } } double estimate_cycle_scale_factor () { unsigned long count; for (count = 1; count < ULONG_MAX / 4; ) { struct rusage usage1, usage2; cycle_count_t t1, t2; // try using up some cycles and time how long (in microseconds) it takes getrusage (RUSAGE_SELF, &usage1); t1 = get_cycle_counter (); use_up_cycles (count); t2 = get_cycle_counter (); getrusage (RUSAGE_SELF, &usage2); long long u1 = (long long) usage1.ru_utime.tv_usec + (long long) usage1.ru_utime.tv_sec * 1000000; long long u2 = (long long) usage2.ru_utime.tv_usec + (long long) usage2.ru_utime.tv_sec * 1000000; if (u2 < u1) continue; // if we've used up at least 0.1s, estimate number of cycles per second if ((u2 - u1) > 100000) return (t2 - t1) / (u2 - u1) * 1000000; count *= 2; } abort (); } void calibrate_cycle_scale_factor () { fprintf (stderr, "Calibrating cycle counter... "); fflush (stderr); do cycle_scale_factor = estimate_cycle_scale_factor (); while (cycle_scale_factor >= 1e11); fprintf (stderr, "ok (%.2le)\n", cycle_scale_factor); } // borrowed from gcc documentation: int compare_doubles (const void *a, const void *b) { const double *da = (const double*) a; const double *db = (const double*) b; return (*da > *db) - (*da < *db); } double profile (double* spread, unsigned* samples, double (*target)(void* arg, unsigned long count), void* arg, double limit) { const unsigned max_times = 50; double times[max_times]; // convert limit from seconds to cycles limit *= cycle_scale_factor; unsigned n; unsigned long count = 1; double elapsed = 0.0; for (n = 0; n < max_times && elapsed < limit; n++) { double time = target (arg, count); elapsed += time; times[n] = time / count; if (time < limit/100) count++; if (count > 10) count = 10; } // get median, lower quartile, upper quartile of measured times qsort (times, n, sizeof (double), compare_doubles); double median = 0.5 * (times[(n - 1) / 2] + times[n / 2]); double q1 = 0.25 * (times[(n - 1) / 4] + times[n / 4] + times[(n + 1) / 4] + times[(n + 2) / 4]); double q3 = 0.25 * (times[3 * n / 4] + times[(3 * n - 1) / 4] + times[(3 * n - 2) / 4] + times[(3 * n - 3) / 4]); if (samples) *samples = n; if (spread) *spread = (q3 - q1) / median; return median; } // end of file **************************************************************** zn_poly-0.9.2/src/000077500000000000000000000000001360464557000140155ustar00rootroot00000000000000zn_poly-0.9.2/src/array.c000066400000000000000000000204611360464557000153020ustar00rootroot00000000000000/* array.c: simple operations on arrays mod m Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" int zn_array_cmp (const ulong* op1, const ulong* op2, size_t n) { for (; n > 0; n--) if (*op1++ != *op2++) return 1; return 0; } void zn_array_copy (ulong* res, const ulong* op, size_t n) { for (; n > 0; n--) *res++ = *op++; } void zn_array_neg (ulong* res, const ulong* op, size_t n, const zn_mod_t mod) { for (; n > 0; n--) *res++ = zn_mod_neg (*op++, mod); } void zn_array_scalar_mul_or_copy (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { if (x != 1) zn_array_scalar_mul (res, op, n, x, mod); else if (res != op) zn_array_copy (res, op, n); } /* Same as zn_array_scalar_mul, but: * always uses REDC reduction (requires modulus is odd); * requires that residues fit into half a word. */ #define _zn_array_scalar_mul_redc_v1 \ ZNP__zn_array_scalar_mul_redc_v1 void _zn_array_scalar_mul_redc_v1 (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { ZNP_ASSERT (mod->bits <= ULONG_BITS/2); ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (x < mod->m); for (; n; n--, op++, res++) *res = zn_mod_reduce_redc ((*op) * x, mod); } /* Same as zn_array_scalar_mul, but: * always uses REDC reduction (requires modulus is odd); * requires that modulus is slim. */ #define _zn_array_scalar_mul_redc_v2 \ ZNP__zn_array_scalar_mul_redc_v2 void _zn_array_scalar_mul_redc_v2 (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { ZNP_ASSERT (zn_mod_is_slim (mod)); ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (x < mod->m); for (; n; n--, op++, res++) { ulong hi, lo; ZNP_MUL_WIDE (hi, lo, *op, x); *res = zn_mod_reduce_wide_redc_slim (hi, lo, mod); } } /* Same as zn_array_scalar_mul, but: * always uses REDC reduction (requires modulus is odd). */ #define _zn_array_scalar_mul_redc_v3 \ ZNP__zn_array_scalar_mul_redc_v3 void _zn_array_scalar_mul_redc_v3 (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (x < mod->m); for (; n; n--, op++, res++) { ulong hi, lo; ZNP_MUL_WIDE (hi, lo, *op, x); *res = zn_mod_reduce_wide_redc (hi, lo, mod); } } /* Same as zn_array_scalar_mul, but always uses REDC reduction (requires that modulus is odd). Dispatches to one of the three versions above, depending on modulus size. */ #define _zn_array_scalar_mul_redc \ ZNP__zn_array_scalar_mul_redc void _zn_array_scalar_mul_redc (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (x < mod->m); if (mod->bits <= ULONG_BITS/2) _zn_array_scalar_mul_redc_v1 (res, op, n, x, mod); else if (zn_mod_is_slim (mod)) _zn_array_scalar_mul_redc_v2 (res, op, n, x, mod); else _zn_array_scalar_mul_redc_v3 (res, op, n, x, mod); } /* Same as zn_array_scalar_mul, but: * always uses plain reduction; * requires that residues fit into half a word. */ #define _zn_array_scalar_mul_plain_v1 \ ZNP__zn_array_scalar_mul_plain_v1 void _zn_array_scalar_mul_plain_v1 (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { ZNP_ASSERT (mod->bits <= ULONG_BITS/2); ZNP_ASSERT (x < mod->m); for (; n; n--, op++, res++) *res = zn_mod_reduce ((*op) * x, mod); } /* Same as zn_array_scalar_mul, but: * always uses plain reduction. */ #define _zn_array_scalar_mul_plain_v2 \ ZNP__zn_array_scalar_mul_plain_v2 void _zn_array_scalar_mul_plain_v2 (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { ZNP_ASSERT (x < mod->m); for (; n; n--, op++, res++) { ulong hi, lo; ZNP_MUL_WIDE (hi, lo, *op, x); *res = zn_mod_reduce_wide (hi, lo, mod); } } /* Same as zn_array_scalar_mul, but always uses plain reduction. Dispatches to one of the versions above, depending on modulus size. */ #define _zn_array_scalar_mul_plain \ ZNP__zn_array_scalar_mul_plain void _zn_array_scalar_mul_plain (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { ZNP_ASSERT (x < mod->m); if (mod->bits <= ULONG_BITS/2) _zn_array_scalar_mul_plain_v1 (res, op, n, x, mod); else _zn_array_scalar_mul_plain_v2 (res, op, n, x, mod); } void _zn_array_scalar_mul (ulong* res, const ulong* op, size_t n, ulong x, int redc, const zn_mod_t mod) { if (redc) _zn_array_scalar_mul_redc (res, op, n, x, mod); else _zn_array_scalar_mul_plain (res, op, n, x, mod); } void zn_array_scalar_mul (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { ZNP_ASSERT (x < mod->m); // Do plain reduction if the vector is really short, or if the modulus // is even (in which case REDC reduction is not available). if (n < 5 || !(mod->m & 1)) { _zn_array_scalar_mul_plain (res, op, n, x, mod); } else { // modulus is odd, and vector is not too short, so we can go faster // by adjusting the multiplier and using REDC reduction _zn_array_scalar_mul_redc (res, op, n, zn_mod_mul_redc (x, mod->B2, mod), mod); } } void zn_array_sub (ulong* res, const ulong* op1, const ulong* op2, size_t n, const zn_mod_t mod) { if (zn_mod_is_slim (mod)) for (; n; n--) *res++ = zn_mod_sub_slim (*op1++, *op2++, mod); else for (; n; n--) *res++ = zn_mod_sub (*op1++, *op2++, mod); } ulong* zn_skip_array_signed_add (ulong* res, ptrdiff_t s, size_t n, const ulong* op1, int neg1, const ulong* op2, int neg2, const zn_mod_t mod) { if (zn_mod_is_slim (mod)) { // slim version if (neg1) { if (neg2) // res = -(op1 + op2) for (; n > 0; n--, res += s, op1++, op2++) *res = zn_mod_neg (zn_mod_add_slim (*op1, *op2, mod), mod); else // res = op2 - op1 for (; n > 0; n--, res += s, op1++, op2++) *res = zn_mod_sub_slim (*op2, *op1, mod); } else { if (neg2) // res = op1 - op2 for (; n > 0; n--, res += s, op1++, op2++) *res = zn_mod_sub_slim (*op1, *op2, mod); else // res = op1 + op2 for (; n > 0; n--, res += s, op1++, op2++) *res = zn_mod_add_slim (*op1, *op2, mod); } } else { // non-slim version if (neg1) { if (neg2) // res = -(op1 + op2) for (; n > 0; n--, res += s, op1++, op2++) *res = zn_mod_neg (zn_mod_add (*op1, *op2, mod), mod); else // res = op2 - op1 for (; n > 0; n--, res += s, op1++, op2++) *res = zn_mod_sub (*op2, *op1, mod); } else { if (neg2) // res = op1 - op2 for (; n > 0; n--, res += s, op1++, op2++) *res = zn_mod_sub (*op1, *op2, mod); else // res = op1 + op2 for (; n > 0; n--, res += s, op1++, op2++) *res = zn_mod_add (*op1, *op2, mod); } } return res; } // end of file **************************************************************** zn_poly-0.9.2/src/invert.c000066400000000000000000000152671360464557000155030ustar00rootroot00000000000000/* invert.c: series inversion Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" /* Extends an approximation of a power series reciprocal from length n1 to length n1 + n2. Must have 1 <= n2 <= n1. op[0, n1 + n2) represents an input power series f. approx[0, n1) should be the first n1 coeffs of the reciprocal of f. This function computes the next n2 coefficients of the reciprocal of f, and stores them at res[0, n2). res may not overlap op or approx. */ #define zn_array_invert_extend \ ZNP_zn_array_invert_extend void zn_array_invert_extend (ulong* res, const ulong* approx, const ulong* op, size_t n1, size_t n2, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); // The algorithm is basically newton iteration, inspired partly by the // algorithm in [HZ04], as follows. // Let f be the input series, of length n1 + n2. // Let g be the current approximation to 1/f, of length n1. // By newton iteration, (2*g - g*g*f) is a length n1 + n2 approximation // to 1/f. Therefore the output of this function should be terms // [n1, n1 + n2) of -g*g*f. // We have g*f = 1 + h*x^n1 + O(x^(n1 + n2)), where h has length n2, // i.e. h consists of terms [n1, n1 + n2) of g*f. Therefore h may be // recovered as the middle product of f[1, n1 + n2) and g[0, n1). // Then g*g*f = g + g*h*x^n1 + O(x^(n1 + n2)). Since g has length // n1, the output is (the negative of) the first n2 coefficients of g*h. // Compute h, put it in res[0, n2). zn_array_mulmid (res, op + 1, n1 + n2 - 1, approx, n1, mod); // Compute g * h, put it into a scratch buffer. ZNP_FASTALLOC (temp, ulong, 6624, n1 + n2 - 1); zn_array_mul (temp, approx, n1, res, n2, mod); // Negate the first n2 coefficients of g * h into the output buffer. zn_array_neg (res, temp, n2, mod); ZNP_FASTFREE (temp); } /* Same as zn_array_invert_extend(), but uses Schonhage/Nussbaumer FFT for the middle product and product. Modulus must be odd. res may overlap op or approx. */ #define zn_array_invert_extend_fft \ ZNP_zn_array_invert_extend_fft void zn_array_invert_extend_fft (ulong* res, const ulong* approx, const ulong* op, size_t n1, size_t n2, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (mod->m & 1); // The algorithm here is the same as in zn_array_invert_extend(), except // that we work with the FFTs directly. This allows us to save one FFT, // since we use the FFT of g in both the middle product step and the // product step. // Determine FFT parameters for computing h = middle product of // f[1, n1 + n2) and g[0, n1). (These parameters will also work for the // subsequent product g * h.) unsigned lgK, lgM; ulong m1, m2, m3, p; mulmid_fft_params (&lgK, &lgM, &m3, &m1, &p, n1 + n2 - 1, n1); m2 = m3 - m1 + 1; // We now have // m1 = ceil(n1 / (M/2)) // = (n1 + p - 1) / (M/2). // Therefore // m3 = ceil((n1 + n2 - 1 + p) / (M/2)) // = ceil(n2 / (M/2)) + (n1 + p - 1) / (M/2) // and // m2 = ceil(n2 / (M/2)) + 1. ulong M = 1UL << lgM; ulong K = 1UL << lgK; ptrdiff_t skip = M + 1; pmfvec_t vec1, vec2; pmfvec_init (vec1, lgK, skip, lgM, mod); pmfvec_init (vec2, lgK, skip, lgM, mod); // Find scaling factor that needs to be applied to both of the products // below; takes into account the fudge from the pointwise multiplies, and // the division by 2^lgK coming from the FFTs. ulong x = pmfvec_mul_fudge (lgM, 0, mod); x = zn_mod_mul (x, zn_mod_pow2 (-lgK, mod), mod); // Split g[0, n1) into m1 coefficients, apply scaling factor, and compute // m3 fourier coefficients, written to vec2. fft_split (vec2, approx, n1, 0, x, 0); pmfvec_fft (vec2, m3, m1, 0); // Split f[1, n1 + n2) into m3 coefficients (in reversed order, with // appropriate zero-padding), and compute transposed IFFT of length m3, // written to vec1. pmfvec_reverse (vec1, m3); fft_split (vec1, op + 1, n1 + n2 - 1, p, 1, 0); pmfvec_reverse (vec1, m3); pmfvec_tpifft (vec1, m3, 0, m3, 0); // Pointwise multiply the above FFT and transposed IFFT, into vec1. pmfvec_mul (vec1, vec1, vec2, m3, 0); // Transposed FFT vec1, obtaining m2 coefficients, then reverse and combine. pmfvec_tpfft (vec1, m3, m2, 0); pmfvec_reverse (vec1, m2); fft_combine (res, n2, vec1, m2, 1); pmfvec_reverse (vec1, m2); // At this stage we have obtained the polynomial h in res[0, n2). // Now we must compute h * g. // Split h[0, n2) into m2 - 1 coefficients, and compute m3 - 1 fourier // coefficients in vec1. For the splitting step, we set the bias to M, // which effectively negates everything, so we're really computing the FFT // of -h. fft_split (vec1, res, n2, 0, 1, M); pmfvec_fft (vec1, m3 - 1, m2 - 1, 0); // Pointwise multiply that FFT with the first FFT of g into vec2. pmfvec_mul (vec2, vec2, vec1, m3 - 1, 1); pmfvec_clear (vec1); // IFFT and combine, to obtain the product -h * g. We only need the low n2 // terms of the product (we throw away the high n1 - 1 terms). pmfvec_ifft (vec2, m3 - 1, 0, m3 - 1, 0); fft_combine (res, n2, vec2, m3 - 1, 0); pmfvec_clear (vec2); } void zn_array_invert (ulong* res, const ulong* op, size_t n, const zn_mod_t mod) { ZNP_ASSERT (n >= 1); // for now assume input is monic ZNP_ASSERT (op[0] == 1); if (n == 1) { res[0] = 1; return; } size_t half = (n + 1) / 2; // ceil(n / 2) // recursively obtain the first half of the output zn_array_invert (res, op, half, mod); // extend to second half of the output if (mod->m & 1) zn_array_invert_extend_fft (res + half, res, op, half, n - half, mod); else zn_array_invert_extend (res + half, res, op, half, n - half, mod); } // end of file **************************************************************** zn_poly-0.9.2/src/ks_support.c000066400000000000000000000234031360464557000163740ustar00rootroot00000000000000/* ks_support.c: support routines for algorithms based on Kronecker substitution Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" void array_reduce (ulong* res, ptrdiff_t s, const ulong* op, size_t n, unsigned w, int redc, const zn_mod_t mod) { ZNP_ASSERT (w >= 1 && w <= 3); ZNP_ASSERT ((mod->m & 1) || !redc); if (w == 1) { if (redc) { for (; n; n--, res += s, op++) *res = zn_mod_reduce_redc (*op, mod); } else { for (; n; n--, res += s, op++) *res = zn_mod_reduce (*op, mod); } } else if (w == 2) { if (redc) { for (; n; n--, res += s, op += 2) *res = zn_mod_reduce2_redc (op[1], op[0], mod); } else { for (; n; n--, res += s, op += 2) *res = zn_mod_reduce2 (op[1], op[0], mod); } } else // w == 3 { if (redc) { for (; n; n--, res += s, op += 3) *res = zn_mod_reduce3_redc (op[2], op[1], op[0], mod); } else { for (; n; n--, res += s, op += 3) *res = zn_mod_reduce3 (op[2], op[1], op[0], mod); } } } /* Same as zn_array_recover_reduce(), but requires 0 < 2 * b <= ULONG_BITS */ #define zn_array_recover_reduce1 \ ZNP_zn_array_recover_reduce1 void zn_array_recover_reduce1 (ulong* res, ptrdiff_t s, const ulong* op1, const ulong* op2, size_t n, unsigned b, int redc, const zn_mod_t mod) { ZNP_ASSERT (b >= 1 && 2 * b <= ULONG_BITS); ulong mask = (1UL << b) - 1; // (x0, x1) and (y0, y1) are two-digit windows into X and Y. ulong x1, x0 = *op1++; op2 += n; ulong y0, y1 = *op2--; ulong borrow = 0; if (redc) { // REDC version for (; n; n--) { y0 = *op2--; x1 = *op1++; if (y0 < x0) { ZNP_ASSERT (y1 != 0); y1--; } *res = zn_mod_reduce_redc (x0 + (y1 << b), mod); res += s; ZNP_ASSERT (y1 != mask); y1 += borrow; borrow = (x1 < y1); x1 -= y1; y1 = (y0 - x0) & mask; x0 = x1 & mask; } } else { // plain reduction version for (; n; n--) { y0 = *op2--; x1 = *op1++; if (y0 < x0) { ZNP_ASSERT (y1 != 0); y1--; } *res = zn_mod_reduce (x0 + (y1 << b), mod); res += s; ZNP_ASSERT (y1 != mask); y1 += borrow; borrow = (x1 < y1); x1 -= y1; y1 = (y0 - x0) & mask; x0 = x1 & mask; } } } /* Same as zn_array_recover_reduce(), but requires ULONG_BITS < 2 * b < 2*ULONG_BITS */ #define zn_array_recover_reduce2 \ ZNP_zn_array_recover_reduce2 void zn_array_recover_reduce2 (ulong* res, ptrdiff_t s, const ulong* op1, const ulong* op2, size_t n, unsigned b, int redc, const zn_mod_t mod) { ZNP_ASSERT (2 * b > ULONG_BITS && b < ULONG_BITS); // The main loop is the same as in zn_array_recover_reduce1(), but the // modular reduction step needs to handle two input words. ulong mask = (1UL << b) - 1; ulong x1, x0 = *op1++; op2 += n; ulong y0, y1 = *op2--; ulong borrow = 0; unsigned b2 = ULONG_BITS - b; if (redc) { // REDC version for (; n; n--) { y0 = *op2--; x1 = *op1++; if (y0 < x0) { ZNP_ASSERT (y1 != 0); y1--; } *res = zn_mod_reduce2_redc (y1 >> b2, x0 + (y1 << b), mod); res += s; ZNP_ASSERT (y1 != mask); y1 += borrow; borrow = (x1 < y1); x1 -= y1; y1 = (y0 - x0) & mask; x0 = x1 & mask; } } else { // plain reduction version for (; n; n--) { y0 = *op2--; x1 = *op1++; if (y0 < x0) { ZNP_ASSERT (y1 != 0); y1--; } *res = zn_mod_reduce2 (y1 >> b2, x0 + (y1 << b), mod); res += s; ZNP_ASSERT (y1 != mask); y1 += borrow; borrow = (x1 < y1); x1 -= y1; y1 = (y0 - x0) & mask; x0 = x1 & mask; } } } /* Same as zn_array_recover_reduce(), but requires b == ULONG_BITS */ #define zn_array_recover_reduce2b \ ZNP_zn_array_recover_reduce2b void zn_array_recover_reduce2b (ulong* res, ptrdiff_t s, const ulong* op1, const ulong* op2, size_t n, unsigned b, int redc, const zn_mod_t mod) { ZNP_ASSERT (b == ULONG_BITS); // Basically the same code as zn_array_recover_reduce2(), specialised // for b == ULONG_BITS. ulong x1, x0 = *op1++; op2 += n; ulong y0, y1 = *op2--; ulong borrow = 0; if (redc) { // REDC version for (; n; n--) { y0 = *op2--; x1 = *op1++; if (y0 < x0) { ZNP_ASSERT (y1 != 0); y1--; } *res = zn_mod_reduce2_redc (y1, x0, mod); res += s; ZNP_ASSERT (y1 != -1UL); y1 += borrow; borrow = (x1 < y1); x1 -= y1; y1 = y0 - x0; x0 = x1; } } else { // plain reduction version for (; n; n--) { y0 = *op2--; x1 = *op1++; if (y0 < x0) { ZNP_ASSERT (y1 != 0); y1--; } *res = zn_mod_reduce2 (y1, x0, mod); res += s; ZNP_ASSERT (y1 != -1UL); y1 += borrow; borrow = (x1 < y1); x1 -= y1; y1 = y0 - x0; x0 = x1; } } } /* Same as zn_array_recover_reduce(), but requires 2 * ULONG_BITS < 2 * b <= 3 * ULONG_BITS. */ #define zn_array_recover_reduce3 \ ZNP_zn_array_recover_reduce3 void zn_array_recover_reduce3 (ulong* res, ptrdiff_t s, const ulong* op1, const ulong* op2, size_t n, unsigned b, int redc, const zn_mod_t mod) { ZNP_ASSERT (b > ULONG_BITS && 2 * b <= 3 * ULONG_BITS); // The main loop is the same as in zn_array_recover_reduce1(), but needs // to operate on double-word quantities everywhere, i.e. we simulate // double-word registers. The suffixes L and H stand for low and high words // of each. ulong maskL = -1UL; ulong maskH = (1UL << (b - ULONG_BITS)) - 1; ulong x1L, x0L = *op1++; ulong x1H, x0H = *op1++; op2 += 2 * n + 1; ulong y0H, y1H = *op2--; ulong y0L, y1L = *op2--; ulong borrow = 0; unsigned b1 = b - ULONG_BITS; unsigned b2 = 2 * ULONG_BITS - b; if (redc) { // REDC version for (; n; n--) { y0H = *op2--; y0L = *op2--; x1L = *op1++; x1H = *op1++; if ((y0H < x0H) || (y0H == x0H && y0L < x0L)) { ZNP_ASSERT (y1H != 0 || y1L != 0); y1H -= (y1L-- == 0); } *res = zn_mod_reduce3_redc ((y1H << b1) + (y1L >> b2), (y1L << b1) + x0H, x0L, mod); res += s; ZNP_ASSERT (y1L != maskL || y1H != maskH); if (borrow) y1H += (++y1L == 0); borrow = ((x1H < y1H) || (x1H == y1H && x1L < y1L)); ZNP_SUB_WIDE (x1H, x1L, x1H, x1L, y1H, y1L); ZNP_SUB_WIDE (y1H, y1L, y0H, y0L, x0H, x0L); y1H &= maskH; x0L = x1L; x0H = x1H & maskH; } } else { // plain reduction version for (; n; n--) { y0H = *op2--; y0L = *op2--; x1L = *op1++; x1H = *op1++; if ((y0H < x0H) || (y0H == x0H && y0L < x0L)) { ZNP_ASSERT (y1H != 0 || y1L != 0); y1H -= (y1L-- == 0); } *res = zn_mod_reduce3 ((y1H << b1) + (y1L >> b2), (y1L << b1) + x0H, x0L, mod); res += s; ZNP_ASSERT (y1L != maskL || y1H != maskH); if (borrow) y1H += (++y1L == 0); borrow = ((x1H < y1H) || (x1H == y1H && x1L < y1L)); ZNP_SUB_WIDE (x1H, x1L, x1H, x1L, y1H, y1L); ZNP_SUB_WIDE (y1H, y1L, y0H, y0L, x0H, x0L); y1H &= maskH; x0L = x1L; x0H = x1H & maskH; } } } /* Dispatches to one of the above routines depending on b. */ void zn_array_recover_reduce (ulong* res, ptrdiff_t s, const ulong* op1, const ulong* op2, size_t n, unsigned b, int redc, const zn_mod_t mod) { ZNP_ASSERT (b > 0 && 2 * b <= 3 * ULONG_BITS); if (2 * b <= ULONG_BITS) zn_array_recover_reduce1 (res, s, op1, op2, n, b, redc, mod); else if (b < ULONG_BITS) zn_array_recover_reduce2 (res, s, op1, op2, n, b, redc, mod); else if (b == ULONG_BITS) zn_array_recover_reduce2b (res, s, op1, op2, n, b, redc, mod); else zn_array_recover_reduce3 (res, s, op1, op2, n, b, redc, mod); } // end of file **************************************************************** zn_poly-0.9.2/src/misc.c000066400000000000000000000023641360464557000151210ustar00rootroot00000000000000/* misc.c: various random things that don't belong anywhere else Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" char* ZNP_version_string = "0.9"; const char* zn_poly_version_string () { return ZNP_version_string; } int floor_lg (ulong x) { int result = -1; while (x) { x >>= 1; result++; } return result; } int ceil_lg (ulong x) { ZNP_ASSERT (x >= 1); return floor_lg (x - 1) + 1; } // end of file **************************************************************** zn_poly-0.9.2/src/mpn_mulmid.c000066400000000000000000000364341360464557000163340ustar00rootroot00000000000000/* mpn_mulmid.c: middle products of integers Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" #include void ZNP_mpn_smp_basecase (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2) { ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n2 >= 1); #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS mp_limb_t hi0, hi1, hi; size_t s, j; j = n2 - 1; s = n1 - j; op2 += j; hi0 = mpn_mul_1 (res, op1, s, *op2); hi1 = 0; for (op1++, op2--; j; j--, op1++, op2--) { hi = mpn_addmul_1 (res, op1, s, *op2); ZNP_ADD_WIDE (hi1, hi0, hi1, hi0, 0, hi); } res[s] = hi0; res[s + 1] = hi1; #else #error Not nails-safe yet #endif } /* Let x = op1[0, 2*n-1), y = op2[0, n), z = op3[0, n). If y >= z, this function computes y - z and the correction term SMP(x, y) - SMP(x, z) - SMP(x, y - z) and returns 0. If y < z, it computes z - y and the correction term SMP(x, z) - SMP(x, y) - SMP(x, z - y) and returns 1. In both cases abs(y - z) is stored at res[0, n). The correction term is v - u*B^n, where u is stored at hi[0, 2) and v is stored at lo[0, 2). None of the output buffers are allowed to overlap either each other or the input buffers. */ #define bilinear2_sub_fixup \ ZNP_bilinear2_sub_fixup int bilinear2_sub_fixup (mp_limb_t* hi, mp_limb_t* lo, mp_limb_t* res, const mp_limb_t* op1, const mp_limb_t* op2, const mp_limb_t* op3, size_t n) { ZNP_ASSERT (n >= 1); #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS int sign = 0; if (mpn_cmp (op2, op3, n) < 0) { // swap y and z if necessary const mp_limb_t* temp = op2; op2 = op3; op3 = temp; sign = 1; } // now can assume y >= z // The correction term is computed as follows. Let // // y_0 - z_0 = u_0 - c_0 B, // y_1 - z_1 - c_0 = u_1 - c_1 B, // y_2 - z_2 - c_1 = u_2 - c_2 B, // ... // y_{n-1} - z_{n-1} - c_{n-2} = u_{n-1}, // // i.e. where c_j is the borrow (0 or 1) from the j-th limb of the // subtraction y - z, and where u_j is the j-th digit of y - z. Note // that c_{-1} = c_{n-1} = 0. By definition we want to compute // // \sum_{0 <= i < 2n-1, 0 <= j < n, n-1 <= i+j < 2n-1} // (c_{j-1} - c_j B) x_i B^{i+j-(n-1)} // // After some algebra this collapses down to // // \sum_{0 <= i < n-1} c_i (x_{n-2-i} - B^n x_{2n-2-i}). // First compute y - z using mpn_sub_n (fast) mpn_sub_n (res, op2, op3, n); // Now loop through and figure out where the borrows happened size_t i; mp_limb_t hi0 = 0, hi1 = 0; mp_limb_t lo0 = 0, lo1 = 0; for (i = n - 1; i; i--, op1++) { mp_limb_t borrow = res[i] - op2[i] + op3[i]; ZNP_ADD_WIDE (lo1, lo0, lo1, lo0, 0, borrow & op1[0]); ZNP_ADD_WIDE (hi1, hi0, hi1, hi0, 0, borrow & op1[n]); } hi[0] = hi0; hi[1] = hi1; lo[0] = lo0; lo[1] = lo1; return sign; #else #error Not nails-safe yet #endif } /* Let x = op1[0, 2*n-1), y = op2[0, 2*n-1), z = op3[0, n). This function computes x + y mod B^(2n-1) and the correction term SMP(x, z) + SMP(y, z) - SMP((x + y) mod B^(2n-1), z). The value x + y mod B^(2n-1) is stored at res[0, 2n-1). The correction term is u*B^n - v, where u is stored at hi[0, 2) and v is stored at lo[0, 2). None of the output buffers are allowed to overlap either each other or the input buffers. */ #define bilinear1_add_fixup \ ZNP_bilinear1_add_fixup void bilinear1_add_fixup (mp_limb_t* hi, mp_limb_t* lo, mp_limb_t* res, const mp_limb_t* op1, const mp_limb_t* op2, const mp_limb_t* op3, size_t n) { ZNP_ASSERT (n >= 1); #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS // The correction term is computed as follows. Let // // x_0 + y_0 = u_0 + c_0 B, // x_1 + y_1 + c_0 = u_1 + c_1 B, // x_2 + y_2 + c_1 = u_2 + c_2 B, // ... // x_{2n-2} + y_{2n-2} + c_{2n-3} = u_{2n-2} + c_{2n-1} B, // // i.e. where c_j is the carry (0 or 1) from the j-th limb of the // addition x + y, and u_j is the j-th digit of x + y. Note that // c_{-1} = 0. By definition we want to compute // // \sum_{0 <= i < 2n-1, 0 <= j < n, n-1 <= i+j < 2n-1} // (c_i B - c_{i-1}) z_j B^{i+j-(n-1)} // // After some algebra this collapses down to // // -\sum_{0 <= j < n-1} c_j z_{n-2-j} + // B^n \sum_{n-1 <= j < 2n-1} c_j z_{2n-2-j}. // First compute x + y using mpn_add_n (fast) mp_limb_t last_carry = mpn_add_n (res, op1, op2, 2*n - 1); // Now loop through and figure out where the carries happened size_t j; mp_limb_t fix0 = 0, fix1 = 0; op3 += n - 2; for (j = 0; j < n - 1; j++, op3--) { // carry = -1 if there was a carry in the j-th limb addition mp_limb_t carry = op1[j+1] + op2[j+1] - res[j+1]; ZNP_ADD_WIDE (fix1, fix0, fix1, fix0, 0, carry & *op3); } lo[0] = fix0; lo[1] = fix1; fix0 = fix1 = 0; op3 += n; for (; j < 2*n - 2; j++, op3--) { // carry = -1 if there was a carry in the j-th limb addition mp_limb_t carry = op1[j+1] + op2[j+1] - res[j+1]; ZNP_ADD_WIDE (fix1, fix0, fix1, fix0, 0, carry & *op3); } ZNP_ADD_WIDE (fix1, fix0, fix1, fix0, 0, (-last_carry) & *op3); hi[0] = fix0; hi[1] = fix1; #else #error Not nails-safe yet #endif } void ZNP_mpn_smp_kara (mp_limb_t* res, const mp_limb_t* op1, const mp_limb_t* op2, size_t n) { ZNP_ASSERT (n >= 2); if (n & 1) { // If n is odd, we strip off the bottom row and last diagonal and // handle them separately at the end (stuff marked O in the diagram // below); the remainder gets handled via karatsuba (stuff marked E): // EEEEO.... // .EEEEO... // ..EEEEO.. // ...EEEEO. // ....OOOOO op2++; } size_t k = n / 2; ZNP_FASTALLOC (temp, mp_limb_t, 6642, 2 * k + 2); mp_limb_t hi[2], lo[2]; // The following diagram shows the contributions from various regions // for k = 3: // AAABBB..... // .AAABBB.... // ..AAABBB... // ...CCCDDD.. // ....CCCDDD. // .....CCCDDD // ------------------------------------------------------------------------ // Step 1: compute contribution from A + contribution from B // Let x = op1[0, 2*k-1) // y = op1[k, 3*k-1) // z = op2[k, 2*k). // Need to compute SMP(x, z) + SMP(y, z). To do this, we will compute // SMP((x + y) mod B^(2k-1), z) and a correction term. // First compute x + y mod B^(2k-1) and the correction term. bilinear1_add_fixup (hi, lo, temp, op1, op1 + k, op2 + k, k); // Now compute SMP(x + y mod B^(2k-1), z). // Store result in first half of output. if (k < ZNP_mpn_smp_kara_thresh) ZNP_mpn_smp_basecase (res, temp, 2 * k - 1, op2 + k, k); else ZNP_mpn_smp_kara (res, temp, op2 + k, k); // Add in the correction term. mpn_sub (res, res, k + 2, lo, 2); mpn_add_n (res + k, res + k, hi, 2); // Save the last two limbs (they're about to get overwritten) mp_limb_t saved[2]; saved[0] = res[k]; saved[1] = res[k + 1]; // ------------------------------------------------------------------------ // Step 2: compute contribution from C + contribution from D // Let x = op1[k, 3*k-1) // y = op1[2*k, 4*k-1) // z = op2[0, k). // Need to compute SMP(x, z) + SMP(y, z). To do this, we will compute // SMP((x + y) mod B^(2k-1), z) and a correction term. // First compute x + y mod B^(2k-1) and the correction term. bilinear1_add_fixup (hi, lo, temp, op1 + k, op1 + 2 * k, op2, k); // Now compute SMP(x + y mod B^(2k-1), z). // Store result in second half of output. if (k < ZNP_mpn_smp_kara_thresh) ZNP_mpn_smp_basecase (res + k, temp, 2 * k - 1, op2, k); else ZNP_mpn_smp_kara (res + k, temp, op2, k); // Add in the correction term. mpn_sub (res + k, res + k, k + 2, lo, 2); mpn_add_n (res + 2 * k, res + 2 * k, hi, 2); // Add back the saved limbs. mpn_add (res + k, res + k, k + 2, saved, 2); // ------------------------------------------------------------------------ // Step 3: compute contribution from B - contribution from C // Let x = op1[k, 3*k-1) // y = op2[k, 2*k). // z = op2[0, k) // Need to compute SMP(x, y) - SMP(x, z). To do this, we will compute // SMP(x, abs(y - z)), and a correction term. // First compute abs(y - z) and the correction term. int sign = bilinear2_sub_fixup (hi, lo, temp, op1 + k, op2 + k, op2, k); // Now compute SMP(x, abs(y - z)). // Store it in second half of temp space, in two's complement (mod B^(k+2)) if (k < ZNP_mpn_smp_kara_thresh) ZNP_mpn_smp_basecase (temp + k, op1 + k, 2 * k - 1, temp, k); else ZNP_mpn_smp_kara (temp + k, op1 + k, temp, k); // Add in the correction term. mpn_add (temp + k, temp + k, k + 2, lo, 2); mp_limb_t borrow = mpn_sub_n (temp + 2 * k, temp + 2 * k, hi, 2); // ------------------------------------------------------------------------ // Step 4: put the pieces together // First half of output is A + C = t4 - t2 // Second half of output is B + D = t6 + t2 if (sign) { mpn_add (res, res, 2 * k + 2, temp + k, k + 2); mpn_sub_1 (res + k + 2, res + k + 2, k, borrow); mpn_sub (res + k, res + k, k + 2, temp + k, k + 2); } else { mpn_sub (res, res, 2 * k + 2, temp + k, k + 2); mpn_add_1 (res + k + 2, res + k + 2, k, borrow); mpn_add (res + k, res + k, k + 2, temp + k, k + 2); } // ------------------------------------------------------------------------ // Step 5: add in correction if the length was odd #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS if (n & 1) { op2--; mp_limb_t hi0 = mpn_addmul_1 (res, op1 + n - 1, n, *op2); mp_limb_t hi1 = 0, lo0 = 0, lo1 = 0; size_t i; for (i = n - 1; i; i--) { mp_limb_t y0, y1; ZNP_MUL_WIDE (y1, y0, op1[2 * n - i - 2], op2[i]); ZNP_ADD_WIDE (hi1, hi0, hi1, hi0, 0, y1); ZNP_ADD_WIDE (lo1, lo0, lo1, lo0, 0, y0); } res[n + 1] = hi1; mpn_add_1 (res + n, res + n, 2, hi0); mpn_add_1 (res + n, res + n, 2, lo1); mpn_add_1 (res + n - 1, res + n - 1, 3, lo0); } ZNP_FASTFREE (temp); #else #error Not nails-safe yet #endif } void ZNP_mpn_smp_n (mp_limb_t* res, const mp_limb_t* op1, const mp_limb_t* op2, size_t n) { if (n < ZNP_mpn_smp_kara_thresh) ZNP_mpn_smp_basecase (res, op1, 2*n - 1, op2, n); else ZNP_mpn_smp_kara (res, op1, op2, n); } void ZNP_mpn_smp (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2) { ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n2 >= 1); size_t n3 = n1 - n2 + 1; if (n3 < ZNP_mpn_smp_kara_thresh) { // region is too narrow to make karatsuba worthwhile for any portion ZNP_mpn_smp_basecase (res, op1, n1, op2, n2); return; } if (n2 > n3) { // slice region into chunks horizontally, i.e. like this: // AA..... // .AA.... // ..BB... // ...BB.. // ....CC. // .....CC // first chunk (marked A in the above diagram) op2 += n2 - n3; ZNP_mpn_smp_kara (res, op1, op2, n3); // remaining chunks (B, C, etc) ZNP_FASTALLOC (temp, mp_limb_t, 6642, n3 + 2); n1 -= n3; n2 -= n3; while (n2 >= n3) { op1 += n3; op2 -= n3; ZNP_mpn_smp_kara (temp, op1, op2, n3); mpn_add_n (res, res, temp, n3 + 2); n1 -= n3; n2 -= n3; } if (n2) { // last remaining chunk op1 += n3; op2 -= n2; ZNP_mpn_smp (temp, op1, n1, op2, n2); mpn_add_n (res, res, temp, n3 + 2); } ZNP_FASTFREE (temp); } else { mp_limb_t save[2]; // slice region into chunks diagonally, i.e. like this: // AAABBBCC.. // .AAABBBCC. // ..AAABBBCC // first chunk (marked A in the above diagram) ZNP_mpn_smp_n (res, op1, op2, n2); n1 -= n2; n3 -= n2; // remaining chunks (B, C, etc) while (n3 >= n2) { op1 += n2; res += n2; // save two limbs which are going to be overwritten save[0] = res[0]; save[1] = res[1]; ZNP_mpn_smp_n (res, op1, op2, n2); // add back saved limbs mpn_add (res, res, n2 + 2, save, 2); n1 -= n2; n3 -= n2; } if (n3) { // last remaining chunk op1 += n2; res += n2; save[0] = res[0]; save[1] = res[1]; ZNP_mpn_smp (res, op1, n1, op2, n2); mpn_add (res, res, n3 + 2, save, 2); } } } void ZNP_mpn_mulmid_fallback (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2) { if (n1 < n2 + 1) return; ZNP_FASTALLOC (temp, mp_limb_t, 6642, n1 + n2); ZNP_mpn_mul (temp, op1, n1, op2, n2); memcpy (res + 2, temp + n2 + 1, sizeof(mp_limb_t) * (n1 - n2 - 1)); ZNP_FASTFREE (temp); } void ZNP_mpn_mulmid (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2) { ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n2 >= 1); if (n2 >= ZNP_mpn_mulmid_fallback_thresh) { ZNP_mpn_mulmid_fallback (res, op1, n1, op2, n2); return; } // try using the simple middle product ZNP_mpn_smp (res, op1, n1, op2, n2); #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS // If there's a possibility of overflow from lower diagonals, we just give // up and do the whole product. (Note: this should happen extremely rarely // on uniform random input. However, on data generated by mpn_random2, it // seems to happen with non-negligible probability.) if (res[1] >= -(mp_limb_t)(n2)) ZNP_mpn_mulmid_fallback (res, op1, n1, op2, n2); #else #error Not nails-safe yet #endif } // end of file **************************************************************** zn_poly-0.9.2/src/mul.c000066400000000000000000000070111360464557000147550ustar00rootroot00000000000000/* mul.c: polynomial multiplication Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" ulong _zn_array_mul_fudge (size_t n1, size_t n2, int sqr, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); if (!(mod->m & 1)) // no fudge if the modulus is even. return 1; tuning_info_t* i = &tuning_info[mod->bits]; if (!sqr) { if (n2 < i->mul_KS2_thresh || n2 < i->mul_KS4_thresh || n2 < i->mul_fft_thresh) // fudge is -B return mod->m - mod->B; } else { if (n2 < i->sqr_KS2_thresh || n2 < i->sqr_KS4_thresh || n2 < i->sqr_fft_thresh) // fudge is -B return mod->m - mod->B; } // return whatever fudge is used by the fft multiplication code return zn_array_mul_fft_fudge (n1, n2, sqr, mod); } void _zn_array_mul (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int fastred, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); // we can use REDC reduction if the modulus is odd and the caller is happy // to receive the result with a fudge factor int odd = (mod->m & 1); int redc = fastred && odd; if (n2 == 1) { // special case for 1xN multiplication _zn_array_scalar_mul (res, op1, n1, op2[0], redc, mod); return; } tuning_info_t* i = &tuning_info[mod->bits]; if (op1 != op2 || n1 != n2) { // multiplying two distinct inputs if (n2 < i->mul_KS2_thresh) zn_array_mul_KS1 (res, op1, n1, op2, n2, redc, mod); else if (n2 < i->mul_KS4_thresh) zn_array_mul_KS2 (res, op1, n1, op2, n2, redc, mod); else if (!odd || n2 < i->mul_fft_thresh) zn_array_mul_KS4 (res, op1, n1, op2, n2, redc, mod); else { ulong x = fastred ? 1 : zn_array_mul_fft_fudge (n1, n2, 0, mod); zn_array_mul_fft (res, op1, n1, op2, n2, x, mod); } } else { // squaring a single input if (n2 < i->sqr_KS2_thresh) zn_array_mul_KS1 (res, op1, n1, op1, n1, redc, mod); else if (n2 < i->sqr_KS4_thresh) zn_array_mul_KS2 (res, op1, n1, op1, n1, redc, mod); else if (!odd || n2 < i->sqr_fft_thresh) zn_array_mul_KS4 (res, op1, n1, op1, n1, redc, mod); else { ulong x = fastred ? 1 : zn_array_mul_fft_fudge (n1, n1, 1, mod); zn_array_mul_fft (res, op1, n1, op1, n1, x, mod); } } } void zn_array_mul (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, const zn_mod_t mod) { _zn_array_mul (res, op1, n1, op2, n2, 0, mod); } // end of file **************************************************************** zn_poly-0.9.2/src/mul_fft.c000066400000000000000000000435771360464557000156350ustar00rootroot00000000000000/* mul_fft.c: polynomial multiplication and and middle product via Schonhage/Nussbaumer FFT Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ /* The multiplication algorithm is essentially that of [Sch77]. We map the problem to S[Z]/(Z^K - 1), where S = R[Y]/(Y^M + 1), M and K are powers of two, and K <= 2M (this ensures we have enough roots of unity in S for the FFTs). The inputs are split into pieces of size M/2. We need K to be large enough that the product can be resolved unambiguously in S[Z]/(Z^K - 1), and we want M minimal subject to these conditions (we end up with M and K around sqrt(n1 + n2)). Our middle product algorithm conceptually has two pieces: (1) reducing the problem from a middle product in R[X] to a middle product in S[Z], and (2) computing the middle product in S[Z] via transposed truncated Fourier transforms, according to the transposition principle (see [BLS03]). (1) Reduction from R[X] to S[Z]. Suppose we want to compute the middle product of op1[0, n1) and op2[0, n2). We first pad op1 by p zeroes on the left, where p satisfies: * n2 + p - 1 is divisible by M/2, * 1 <= p <= M/2. Call this op1', which has length n1' = n1 + p. Now split op1' into m1 chunks of length M/2 and op2 into m2 chunks of length M/2 (both zero-padded on the right to get up to a full chunk of length M/2), i.e. m1 = ceil(n1' / (M/2)) m2 = ceil(n2 / (M/2)). Compute the middle product of the resulting polynomials of length m1 and m2 over S. (For this we require that m1 <= K, so that the FFTs work.) The result has length m1 - m2 + 1; we must show that the result has enough information to reconstruct the middle product of the original op1 and op2 in R[X]. The first coefficient of the middle product of op1 and op2 would usually be at index n2 - 1 into the full product op1 * op2. Therefore it appears at index n2 + p - 1 into the full product op1' * op2. This appears at index (n2 + p - 1) / (M/2) = m2 of the middle product we performed over S, or in other words, the *second* coefficient (since the first coefficient would be the one at index m2 - 1). This is good since we also need the overlapping data from the previous coefficient. The last coefficient of the middle product of op1 and op2 would usually be at index n1 - 1 into the full product op1 * op2. That's at index n1 + p - 1 into the full product of op1' * op2. This appears at index floor((n1 + p - 1) / (M/2)) <= m1 - 1 of the middle product over S, which is good since that's the last one we computed. (2) Middle product in S[Z]. Fix a polynomial F in S[Z] of length n, and let m >= n. Consider the linear map that sends the length m - n + 1 polynomial G to G * F (which has length m). The transpose of this map sends a length m polynomial H to the reversal of the middle product of F and H' (of length m - n + 1), where H' is the reversal of H. Therefore our algorithm is: * compute usual FFT of F * compute transposed IFFT of the reversal of H * multiply coefficients pointwise in S * compute transposed FFT of product, and reverse the output */ #include #include "zn_poly_internal.h" /* ============================================================================ splitting and combining routines ============================================================================ */ void fft_split (pmfvec_t res, const ulong* op, size_t n, size_t k, ulong x, ulong b) { const zn_mod_struct* mod = res->mod; ulong M = res->M; pmf_t dest = res->data; // handle completely zero blocks from leading zeroes for (; k >= M/2; k -= M/2, dest += res->skip) { dest[0] = b; zn_array_zero (dest + 1, M); } // handle block with partially leading zeroes if (k) { dest[0] = b; zn_array_zero (dest + 1, k); size_t left = M/2 - k; if (n < left) { zn_array_scalar_mul_or_copy (dest + 1 + k, op, n, x, mod); zn_array_zero (dest + 1 + k + n, M - n - k); return; } zn_array_scalar_mul_or_copy (dest + 1 + k, op, left, x, mod); zn_array_zero (dest + 1 + M/2, M/2); n -= left; op += left; dest += res->skip; } // handle complete blocks of length M/2 for (; n >= M/2; n -= M/2, op += M/2, dest += res->skip) { dest[0] = b; zn_array_scalar_mul_or_copy (dest + 1, op, M/2, x, mod); zn_array_zero (dest + 1 + M/2, M/2); } // last block of fractional length if (n) { dest[0] = b; zn_array_scalar_mul_or_copy (dest + 1, op, n, x, mod); zn_array_zero (dest + 1 + n, M - n); } } /* If neg == 0, copies op[0, n) into res[0, n). If neg == 1, copies the negative of op[0, n) into res[0, n). */ #define zn_array_signed_copy \ ZNP_zn_array_signed_copy ZNP_INLINE void zn_array_signed_copy (ulong* res, const ulong* op, ulong n, int neg, const zn_mod_t mod) { if (neg) zn_array_neg (res, op, n, mod); else zn_array_copy (res, op, n); } /* This routine adds the last M/2 coefficients of op1 to the first M/2 coefficients of op2, and writes them to res[0, M/2). If n < M/2, it only writes the first n coefficients, and ignores the rest. If op1 is NULL, it is treated as being zero. Ditto for op2. The main complication in this routine is dealing with the bias fields of op1 and op2, so some segments need to be added and some subtracted. We still do everything in a single pass. */ #define fft_combine_chunk \ ZNP_fft_combine_chunk void fft_combine_chunk (ulong* res, size_t n, pmf_const_t op1, pmf_const_t op2, ulong M, const zn_mod_t mod) { n = ZNP_MIN (n, M/2); if (op1 == NULL && op2 == NULL) { // both inputs are zero; just writes zeroes to the output zn_array_zero (res, n); return; } // We want to start reading from the Y^(M/2) coefficient of op1, which // is located at index M/2 - bias(op1) mod 2M. But really there are only // M coefficients, so we reduce the index mod M, and put neg1 = 1 if we // have to negate the coefficients (i.e. they wrapped around // negacyclically). If op1 is zero, just put s1 = ULONG_MAX. ulong s1 = ULONG_MAX; int neg1; if (op1) { s1 = (M/2 - op1[0]) & (2*M - 1); neg1 = (s1 >= M); if (neg1) s1 -= M; } // Similarly for op2, but we want to start reading from the Y^(M/2) // coefficient. ulong s2 = ULONG_MAX; int neg2; if (op2) { s2 = (-op2[0]) & (2*M - 1); neg2 = (s2 >= M); if (neg2) s2 -= M; } // Swap the inputs so that s1 <= s2. if (s1 > s2) { pmf_const_t op_temp = op1; op1 = op2; op2 = op_temp; ulong s_temp = s1; s1 = s2; s2 = s_temp; int neg_temp = neg1; neg1 = neg2; neg2 = neg_temp; } // advance beyond bias fields op1++; op2++; if (s2 == ULONG_MAX) { // One of the inputs is zero; may assume it's op2. We only need to // work with op1. // op1 looks like this: // // 0 s1 M // op1: BBBBBBBBAAAAAAAAAAAAAAAA // // The A parts need to be copied with the same sign; the B parts need // to have the sign flipped. if (n <= M - s1) // Only need part of AAAA up to n. zn_array_signed_copy (res, op1 + s1, n, neg1, mod); else { // Copy AAAAA zn_array_signed_copy (res, op1 + s1, M - s1, neg1, mod); // Negate BBBBB zn_array_signed_copy (res + M - s1, op1, n - M + s1, !neg1, mod); } return; } // Neither op1 nor op2 are zero. // The picture looks like this: // // 0 s1 M // op1: CCCCCCCCAAAAAAAAAAAAABBBBBBBBBBBBBBBBBB // // s2 M // op2: BBBBBBBBBBBBBBBBBBCCCCCCCCAAAAAAAAAAAAA // Combine the portions marked AAAA // (bail out if we reach n) if (n <= M - s2) { zn_skip_array_signed_add (res, 1, n, op2 + s2, neg2, op1 + s1, neg1, mod); return; } res = zn_skip_array_signed_add (res, 1, M - s2, op2 + s2, neg2, op1 + s1, neg1, mod); n -= (M - s2); // Combine the portions marked BBBB // (bail out if we reach n) if (n <= s2 - s1) { zn_skip_array_signed_add (res, 1, n, op2, !neg2, op1 + s1 + M - s2, neg1, mod); return; } res = zn_skip_array_signed_add (res, 1, s2 - s1, op2, !neg2, op1 + s1 + M - s2, neg1, mod); n -= (s2 - s1); // Combine the portions marked CCCC zn_skip_array_signed_add (res, 1, (n >= s1) ? s1 : n, op2 + s2 - s1, !neg2, op1, !neg1, mod); } void fft_combine (ulong* res, size_t n, const pmfvec_t op, ulong z, int skip_first) { if (z == 0) { // zero it out zn_array_zero (res, n); return; } if (!skip_first) { // Copy the relevant part of the first coefficient size_t k = ZNP_MIN (n, op->M/2); fft_combine_chunk (res, k, NULL, op->data, op->M, op->mod); res += k; n -= k; } // In the loop below, ptr1 = (i-1)-th coefficient, ptr2 = i-th coefficient pmf_const_t ptr1 = op->data; pmf_const_t ptr2 = op->data + op->skip; ulong i; for (i = 1; i < z && n >= op->M/2; i++, n -= op->M/2, res += op->M/2, ptr1 += op->skip, ptr2 += op->skip) { // Add first half of i-th coefficient to second half of (i-1)-th // coefficient fft_combine_chunk (res, n, ptr1, ptr2, op->M, op->mod); } if (i < z) { // Ran out of output space before getting to last pmf_t. // Do the same add operation as above, but stop when the buffer is full. fft_combine_chunk (res, n, ptr1, ptr2, op->M, op->mod); return; } // Arrived at last coefficient, still haven't exhausted output buffer. // Copy second half of last coefficient, and zero-pad to the end. fft_combine_chunk (res, n, ptr1, NULL, op->M, op->mod); if (n > op->M/2) zn_array_zero (res + op->M/2, n - op->M/2); } /* ============================================================================ multiplication routine ============================================================================ */ void mul_fft_params (unsigned* lgK, unsigned* lgM, ulong* m1, ulong* m2, size_t n1, size_t n2) { unsigned _lgM; size_t _m1, _m2, _m3; ulong M; // increase lgM until all the conditions are satisfied for (_lgM = 1; ; _lgM++) { _m1 = CEIL_DIV_2EXP (n1, _lgM - 1); // = ceil(n1 / (M/2)) _m2 = CEIL_DIV_2EXP (n2, _lgM - 1); // = ceil(n2 / (M/2)) _m3 = _m1 + _m2 - 1; M = 1UL << _lgM; if (_m3 <= 2 * M) break; } *lgM = _lgM; *lgK = (_m3 > M) ? (_lgM + 1) : _lgM; *m1 = _m1; *m2 = _m2; } ulong zn_array_mul_fft_fudge (size_t n1, size_t n2, int sqr, const zn_mod_t mod) { unsigned lgK, lgM; ulong m1, m2; mul_fft_params (&lgK, &lgM, &m1, &m2, n1, n2); // need to divide by 2^lgK coming from FFT ulong fudge1 = zn_mod_pow2 (-lgK, mod); // and take into account fudge from pointwise multiplies ulong fudge2 = pmfvec_mul_fudge (lgM, sqr, mod); return zn_mod_mul (fudge1, fudge2, mod); } void zn_array_mul_fft (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, ulong x, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); unsigned lgK, lgM; // number of pmf_t coefficients for each input poly ulong m1, m2; // figure out how big the transform needs to be mul_fft_params (&lgK, &lgM, &m1, &m2, n1, n2); // number of pmf_t coefficients for output poly ulong m3 = m1 + m2 - 1; ulong M = 1UL << lgM; ulong K = 1UL << lgK; ptrdiff_t skip = M + 1; pmfvec_t vec1, vec2; int sqr = (op1 == op2 && n1 == n2); if (!sqr) { // multiplying two distinct inputs // split inputs into pmf_t's and perform FFTs pmfvec_init (vec1, lgK, skip, lgM, mod); fft_split (vec1, op1, n1, 0, 1, 0); pmfvec_fft (vec1, m3, m1, 0); // note: we apply the fudge factor here, because the second input is // shorter than both the first input and the output :-) pmfvec_init (vec2, lgK, skip, lgM, mod); fft_split (vec2, op2, n2, 0, x, 0); pmfvec_fft (vec2, m3, m2, 0); // pointwise multiplication pmfvec_mul (vec1, vec1, vec2, m3, 1); pmfvec_clear (vec2); } else { // squaring a single input // split input into pmf_t's and perform FFTs pmfvec_init (vec1, lgK, skip, lgM, mod); fft_split (vec1, op1, n1, 0, 1, 0); pmfvec_fft (vec1, m3, m1, 0); // pointwise multiplication pmfvec_mul (vec1, vec1, vec1, m3, 1); } // inverse FFT, and write output pmfvec_ifft (vec1, m3, 0, m3, 0); size_t n3 = n1 + n2 - 1; fft_combine (res, n3, vec1, m3, 0); pmfvec_clear (vec1); // if we're squaring, then we haven't applied the fudge factor yet, // so do it now if (sqr) zn_array_scalar_mul_or_copy (res, res, n3, x, mod); } /* ============================================================================ middle product routines ============================================================================ */ void mulmid_fft_params (unsigned* lgK, unsigned* lgM, ulong* m1, ulong* m2, ulong* p, size_t n1, size_t n2) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); unsigned _lgM; size_t _m1; ulong M, _p; // increase lgM until all the conditions are satisfied for (_lgM = 1; ; _lgM++) { M = 1UL << _lgM; _p = ((-n2) & (M/2 - 1)) + 1; _m1 = CEIL_DIV_2EXP (n1 + _p, _lgM - 1); if (_m1 <= 2 * M) break; } *lgM = _lgM; *lgK = (_m1 > M) ? (_lgM + 1) : _lgM; *p = _p; *m1 = _m1; *m2 = CEIL_DIV_2EXP (n2, _lgM - 1); } ulong zn_array_mulmid_fft_precomp1_fudge (size_t n1, size_t n2, const zn_mod_t mod) { unsigned lgK, lgM; ulong m1, m2, p; mulmid_fft_params (&lgK, &lgM, &m1, &m2, &p, n1, n2); // need to divide by 2^lgK coming from FFT ulong fudge1 = zn_mod_pow2 (-lgK, mod); // and take into account fudge from pointwise multiplies ulong fudge2 = pmfvec_mul_fudge (lgM, 0, mod); return zn_mod_mul (fudge1, fudge2, mod); } ulong zn_array_mulmid_fft_fudge (size_t n1, size_t n2, const zn_mod_t mod) { return zn_array_mulmid_fft_precomp1_fudge (n1, n2, mod); } void zn_array_mulmid_fft_precomp1_init (zn_array_mulmid_fft_precomp1_t res, const ulong* op1, size_t n1, size_t n2, ulong x, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); res->n1 = n1; res->n2 = n2; unsigned lgK, lgM; mulmid_fft_params (&lgK, &lgM, &res->m1, &res->m2, &res->p, n1, n2); ulong M = 1UL << lgM; ptrdiff_t skip = M + 1; // allocate space for transposed IFFT pmfvec_init (res->vec1, lgK, skip, lgM, mod); // split input, with padding, in reversed order, and apply requested // scaling factor pmfvec_reverse (res->vec1, res->m1); fft_split (res->vec1, op1, n1, res->p, x, 0); pmfvec_reverse (res->vec1, res->m1); // transposed IFFT first input pmfvec_tpifft (res->vec1, res->m1, 0, res->m1, 0); } void zn_array_mulmid_fft_precomp1_execute (ulong* res, const ulong* op2, ulong x, const zn_array_mulmid_fft_precomp1_t precomp) { const pmfvec_struct* vec1 = precomp->vec1; size_t n1 = precomp->n1; size_t n2 = precomp->n2; ulong m1 = precomp->m1; ulong m2 = precomp->m2; pmfvec_t vec2; pmfvec_init (vec2, vec1->lgK, vec1->skip, vec1->lgM, vec1->mod); // split and compute FFT of second input (with requested scaling factor) fft_split (vec2, op2, n2, 0, x, 0); pmfvec_fft (vec2, m1, m2, 0); // pointwise multiply against precomputed transposed IFFT of first input pmfvec_mul (vec2, vec1, vec2, m1, 0); // transposed FFT ulong m3 = m1 - m2 + 1; pmfvec_tpfft (vec2, m1, m3, 0); // reverse output and combine pmfvec_reverse (vec2, m3); fft_combine (res, n1 - n2 + 1, vec2, m3, 1); pmfvec_reverse (vec2, m3); pmfvec_clear (vec2); } void zn_array_mulmid_fft_precomp1_clear (zn_array_mulmid_fft_precomp1_t op) { pmfvec_clear (op->vec1); } void zn_array_mulmid_fft (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, ulong x, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); // re-use the precomp1 code zn_array_mulmid_fft_precomp1_t precomp; zn_array_mulmid_fft_precomp1_init (precomp, op1, n1, n2, x, mod); zn_array_mulmid_fft_precomp1_execute (res, op2, 1, precomp); zn_array_mulmid_fft_precomp1_clear (precomp); } // end of file **************************************************************** zn_poly-0.9.2/src/mul_fft_dft.c000066400000000000000000000571151360464557000164630ustar00rootroot00000000000000/* mul_fft_dft.c: multiplication by Schonhage/Nussbaumer FFT, with a few layers of naive DFT to save memory Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" /* Returns length n bit reversal of x. */ #define bit_reverse \ ZNP_bit_reverse ulong bit_reverse (ulong x, unsigned n) { ulong y = 0; unsigned i; for (i = 0; i < n; i++) { y <<= 1; y += x & 1; x >>= 1; } return y; } /* Let [a, b) = intersection of [0, n) and [k, k + M/2). This functions adds op[a, b) to res. */ #define merge_chunk_to_pmf \ ZNP_merge_chunk_to_pmf void merge_chunk_to_pmf (pmf_t res, const ulong* op, size_t n, size_t k, ulong M, const zn_mod_t mod) { ZNP_ASSERT ((M & 1) == 0); ulong r = (-res[0]) & (2*M - 1); size_t end = k + M/2; if (end > n) end = n; if (k >= end) // nothing to do return; op += k; ulong size = end - k; // now we need to handle op[0, size), and we are guaranteed size <= M/2. if (r < M) { if (size <= M - r) zn_array_add_inplace (res + 1 + r, op, size, mod); else { zn_array_add_inplace (res + 1 + r, op, M - r, mod); // negacyclic wraparound: zn_array_sub_inplace (res + 1, op + M - r, size - M + r, mod); } } else { r -= M; if (size <= M - r) zn_array_sub_inplace (res + 1 + r, op, size, mod); else { zn_array_sub_inplace (res + 1 + r, op, M - r, mod); // negacyclic wraparound: zn_array_add_inplace (res + 1, op + M - r, size - M + r, mod); } } } /* Adds op into res, starting at index k, and not writing beyond res + n. If op == NULL, does no operation. */ #define merge_chunk_from_pmf \ ZNP_merge_chunk_from_pmf void merge_chunk_from_pmf (ulong* res, size_t n, const pmf_t op, size_t k, ulong M, const zn_mod_t mod) { if (op == NULL) return; size_t end = k + M; if (end > n) end = n; if (k >= end) // nothing to do return; res += k; ulong size = end - k; // now we need to write to res[0, size), and we are guaranteed size <= M. ulong r = op[0] & (2*M - 1); if (r < M) { if (size <= r) zn_array_sub_inplace (res, op + 1 + M - r, size, mod); else { zn_array_sub_inplace (res, op + 1 + M - r, r, mod); // negacyclic wraparound: zn_array_add_inplace (res + r, op + 1, size - r, mod); } } else { r -= M; if (size <= r) zn_array_add_inplace (res, op + 1 + M - r, size, mod); else { zn_array_add_inplace (res, op + 1 + M - r, r, mod); // negacyclic wraparound: zn_array_sub_inplace (res + r, op + 1, size - r, mod); } } } /* ============================================================================ "virtual" pmf_t's ============================================================================ */ /* The virtual_pmf_t and virtual_pmfvec_t are similar to their non-virtual counterparts (pmf_t and pmfvec_t), but the underlying representation is optimised for the case where: * many of the coefficients in the vector are zero, and * many of the coefficients differ only by multiplication by a root of unity. They are used in the mul_fft_dft routine (below) to perform truncated inverse FFTs on vectors containing only a single nonzero entry. This is achieved in pretty much a constant (logarithmic?) number of coefficient operations, instead of K*lg(K) coefficient operations. (In the *forward* FFT case, it's easy to write down a "formula" for the FFT, so we don't even need a vector in which to do the computation; but in the inverse case I couldn't see a simple formula, so these structs in effect let us figure out a formula on the fly.) */ #define virtual_pmfvec_struct \ ZNP_virtual_pmfvec_struct struct virtual_pmfvec_struct; // forward declaration /* Virtual version of a pmf_t. Each virtual_pmf_t belongs to a "parent" virtual_pmfvec_t. The index field is: * -1 if the value is zero * otherwise, an index into parent's buf, which is where the actual coefficient data is stored. The bias field overrides the bias word in the underlying pmf_t. (This lets different virtual_pmf's share memory but still have different bias values.) */ #define virtual_pmf_struct \ ZNP_virtual_pmf_struct struct virtual_pmf_struct { struct virtual_pmfvec_struct* parent; int index; ulong bias; }; #define virtual_pmf_t \ ZNP_virtual_pmf_t typedef struct virtual_pmf_struct virtual_pmf_t[1]; /* Virtual version of pmfvec_t. M, lgM, K, lgK, mod are as for pmfvec_t. data is an array of K virtual_pmf_t's (the coefficients in the vector). The underlying data is managed by three arrays of length max_buffers: * buf[i] points to a pmf_t, or NULL if the i-th slot is not yet associated to any actual memory. * count[i] is a reference count for slot #i, i.e. the number of pmf_virtual_pmf_t's pointing to this slot. * external[i] is a flag indicating whether the memory belongs to someone else (i.e. it's not the responsiblity of the virtual_pmfvec to free it). */ struct virtual_pmfvec_struct { ulong M; unsigned lgM; ulong K; unsigned lgK; const zn_mod_struct* mod; virtual_pmf_t* data; unsigned max_buffers; ulong** buf; unsigned* count; int* external; }; #define virtual_pmfvec_t \ ZNP_virtual_pmfvec_t typedef struct virtual_pmfvec_struct virtual_pmfvec_t[1]; /* ---------------------------------------------------------------------------- virtual_pmf_t infrastructure ---------------------------------------------------------------------------- */ /* Initialises a virtual_pmf_t to zero, with a given parent vector. */ #define virtual_pmf_init \ ZNP_virtual_pmf_init void virtual_pmf_init (virtual_pmf_t res, virtual_pmfvec_t parent) { res->index = -1; res->parent = parent; } /* Initialises a virtual_pmfvec_t to length K, with all zero values. All slots are initially marked as empty. */ #define virtual_pmfvec_init \ ZNP_virtual_pmfvec_init void virtual_pmfvec_init (virtual_pmfvec_t vec, unsigned lgK, unsigned lgM, const zn_mod_t mod) { vec->mod = mod; vec->lgM = lgM; vec->M = 1UL << lgM; vec->lgK = lgK; vec->K = 1UL << lgK; vec->data = (virtual_pmf_t*) malloc (vec->K * sizeof (virtual_pmf_t)); ulong i; for (i = 0; i < vec->K; i++) virtual_pmf_init (vec->data[i], vec); vec->max_buffers = 2 * vec->K; // should be safe vec->buf = (ulong**) malloc (sizeof (ulong*) * vec->max_buffers); vec->count = (unsigned*) malloc (sizeof (unsigned) * vec->max_buffers); vec->external = (int*) malloc (sizeof (int) * vec->max_buffers); for (i = 0; i < vec->max_buffers; i++) { vec->buf[i] = NULL; vec->count[i] = 0; vec->external[i] = 0; } } /* Sets all values to zero, and detaches externally-allocated memory from all slots. This does *not* free any memory owned by this vector; buffers already allocated will get re-used by subsequent operations. */ #define virtual_pmfvec_reset \ ZNP_virtual_pmfvec_reset void virtual_pmfvec_reset (virtual_pmfvec_t vec) { ulong i; for (i = 0; i < vec->K; i++) vec->data[i]->index = -1; for (i = 0; i < vec->max_buffers; i++) { vec->count[i] = 0; if (vec->external[i]) { vec->buf[i] = NULL; vec->external[i] = 0; } } } /* Destroys the vector, and frees all memory owned by it. */ #define virtual_pmfvec_clear \ ZNP_virtual_pmfvec_clear void virtual_pmfvec_clear (virtual_pmfvec_t vec) { virtual_pmfvec_reset (vec); ulong i; for (i = 0; i < vec->max_buffers; i++) if (vec->buf[i] && !vec->external[i]) free (vec->buf[i]); free (vec->external); free (vec->buf); free (vec->count); free (vec->data); } /* Finds a free slot (one not attached to any underlying pmf_t yet), and returns its index. */ #define virtual_pmfvec_find_slot \ ZNP_virtual_pmfvec_find_slot unsigned virtual_pmfvec_find_slot (virtual_pmfvec_t vec) { unsigned i; for (i = 0; i < vec->max_buffers; i++) if (!vec->buf[i]) return i; // this should never happen; we always should have enough slots ZNP_ASSERT (0); } /* Finds a slot attached to an underlying pmf_t which is not currently used by any other virtual_pmf_t's, and returns its index. If there are no such slots, it allocated more space, attaches a slot to it, and returns its index. In both cases, the reference count of the returned slot will be 1. */ #define virtual_pmfvec_new_buf \ ZNP_virtual_pmfvec_new_buf unsigned virtual_pmfvec_new_buf (virtual_pmfvec_t vec) { // first search for an already-allocated buffer that no-one else is using unsigned i; for (i = 0; i < vec->max_buffers; i++) if (vec->buf[i] && !vec->count[i]) break; if (i == vec->max_buffers) { // not found; need to allocate more space i = virtual_pmfvec_find_slot (vec); vec->buf[i] = (ulong*) malloc (sizeof (ulong) * (vec->M + 1)); vec->external[i] = 0; } vec->count[i] = 1; return i; } /* res := 0 */ #define virtual_pmf_zero \ ZNP_virtual_pmf_zero void virtual_pmf_zero (virtual_pmf_t res) { // already zero, nothing to do if (res->index == -1) return; // detach from buffer, update refcount res->parent->count[res->index]--; res->index = -1; } /* Sets res := op, by attaching a slot in the parent vector to the memory occupied by op (no data movement is involved). Note: this means that subsequent changes to op affect the value of res! */ #define virtual_pmf_import \ ZNP_virtual_pmf_import void virtual_pmf_import (virtual_pmf_t res, pmf_t op) { virtual_pmf_zero (res); res->index = virtual_pmfvec_find_slot (res->parent); res->parent->count[res->index] = 1; res->parent->external[res->index] = 1; res->parent->buf[res->index] = op; res->bias = op[0]; } /* Returns a pmf_t with the value of op. All this does is overwrite the bias field of the underlying pmf_t and return a pointer it. It doesn't copy any data. (This is fragile; the returned value should be used immediately, before doing anything else with the parent vector.) If op is zero, the return value is NULL. */ #define virtual_pmf_export \ ZNP_virtual_pmf_export pmf_t virtual_pmf_export (virtual_pmf_t op) { if (op->index == -1) return NULL; pmf_t res = op->parent->buf[op->index]; res[0] = op->bias; return res; } /* Ensures that op has a reference count of 1, by possibly copying the data to a new buffer if necessary. Then it's safe to mutate without affecting the value of other virtual_pmf_t's. If op is zero, this is a no-op. */ #define virtual_pmf_isolate \ ZNP_virtual_pmf_isolate void virtual_pmf_isolate (virtual_pmf_t op) { if (op->index == -1) return; struct virtual_pmfvec_struct* parent = op->parent; if (parent->count[op->index] == 1) // already has reference count 1 return; // detach parent->count[op->index]--; // find new buffer and copy the data unsigned index = virtual_pmfvec_new_buf (parent); pmf_set (parent->buf[index], parent->buf[op->index], parent->M); op->index = index; } /* ---------------------------------------------------------------------------- virtual_pmf_t coefficient operations These functions all handle reference counting automatically ---------------------------------------------------------------------------- */ /* res := op */ #define virtual_pmf_set \ ZNP_virtual_pmf_set void virtual_pmf_set (virtual_pmf_t res, virtual_pmf_t op) { if (op == res) return; virtual_pmf_zero (res); if (op->index == -1) return; res->bias = op->bias; res->index = op->index; res->parent->count[op->index]++; } /* op := Y^r * op */ #define virtual_pmf_rotate \ ZNP_virtual_pmf_rotate void virtual_pmf_rotate (virtual_pmf_t op, ulong r) { if (op->index != -1) op->bias += r; } /* res += op */ #define virtual_pmf_add \ ZNP_virtual_pmf_add void virtual_pmf_add (virtual_pmf_t res, virtual_pmf_t op) { ZNP_ASSERT (res->parent == op->parent); struct virtual_pmfvec_struct* parent = res->parent; // op == 0 if (op->index == -1) return; // res == 0 if (res->index == -1) { virtual_pmf_set (res, op); return; } virtual_pmf_isolate (res); pmf_t p2 = parent->buf[res->index]; pmf_t p1 = parent->buf[op->index]; p2[0] = res->bias; p1[0] = op->bias; pmf_add (p2, p1, parent->M, parent->mod); } /* res -= op */ #define virtual_pmf_sub \ ZNP_virtual_pmf_sub void virtual_pmf_sub (virtual_pmf_t res, virtual_pmf_t op) { ZNP_ASSERT (res->parent == op->parent); struct virtual_pmfvec_struct* parent = res->parent; // op == 0 if (op->index == -1) return; // res == 0 if (res->index == -1) { virtual_pmf_set (res, op); virtual_pmf_rotate (res, parent->M); return; } virtual_pmf_isolate (res); pmf_t p2 = parent->buf[res->index]; pmf_t p1 = parent->buf[op->index]; p2[0] = res->bias; p1[0] = op->bias; pmf_sub (p2, p1, parent->M, parent->mod); } /* op1 := op1 + op2 op2 := op2 - op1 */ #define virtual_pmf_bfly \ ZNP_virtual_pmf_bfly void virtual_pmf_bfly (virtual_pmf_t op1, virtual_pmf_t op2) { ZNP_ASSERT (op1->parent == op2->parent); struct virtual_pmfvec_struct* parent = op1->parent; // op1 == 0 if (op1->index == -1) { virtual_pmf_set (op1, op2); return; } // op2 == 0 if (op2->index == -1) { virtual_pmf_set (op2, op1); virtual_pmf_rotate (op2, parent->M); return; } virtual_pmf_isolate (op1); virtual_pmf_isolate (op2); pmf_t p1 = parent->buf[op1->index]; pmf_t p2 = parent->buf[op2->index]; p1[0] = op1->bias; p2[0] = op2->bias; pmf_bfly (p1, p2, parent->M, parent->mod); } /* op := op / 2 */ #define virtual_pmf_divby2 \ ZNP_virtual_pmf_divby2 void virtual_pmf_divby2 (virtual_pmf_t op) { struct virtual_pmfvec_struct* parent = op->parent; if (op->index == -1) return; virtual_pmf_isolate (op); pmf_divby2 (parent->buf[op->index], parent->M, parent->mod); } /* ---------------------------------------------------------------------------- virtual IFFT routine ---------------------------------------------------------------------------- */ /* Performs truncated IFFT on vec. The meanings of n, fwd and t are the same as for pmfvec_ifft (see pmfvec_fft.c). The algorithm is essentially the same as pmfvec_ifft_dc(), except that we don't worry about the z parameter (zero entries are handled automatically by the underlying optimised representation). */ #define virtual_pmfvec_ifft \ ZNP_virtual_pmfvec_ifft void virtual_pmfvec_ifft (virtual_pmfvec_t vec, ulong n, int fwd, ulong t) { ZNP_ASSERT (vec->lgK <= vec->lgM + 1); ZNP_ASSERT (t * vec->K < 2 * vec->M); ZNP_ASSERT (n + fwd <= vec->K); if (vec->lgK == 0) return; vec->lgK--; vec->K >>= 1; const zn_mod_struct* mod = vec->mod; virtual_pmf_t* data = vec->data; ulong M = vec->M; ulong K = vec->K; ulong s, r = M >> vec->lgK; long i; if (n + fwd <= K) { for (i = K - 1; i >= (long) n; i--) { virtual_pmf_add (data[i], data[i + K]); virtual_pmf_divby2 (data[i]); } virtual_pmfvec_ifft (vec, n, fwd, t << 1); for (; i >= 0; i--) { virtual_pmf_add (data[i], data[i]); virtual_pmf_sub (data[i], data[i + K]); } } else { virtual_pmfvec_ifft (vec, K, 0, t << 1); for (i = K - 1, s = t + r * i; i >= (long)(n - K); i--, s -= r) { virtual_pmf_sub (data[i + K], data[i]); virtual_pmf_sub (data[i], data[i + K]); virtual_pmf_rotate (data[i + K], M + s); } vec->data += K; virtual_pmfvec_ifft (vec, n - K, fwd, t << 1); vec->data -= K; for (; i >= 0; i--, s -= r) { virtual_pmf_rotate (data[i + K], M - s); virtual_pmf_bfly (data[i + K], data[i]); } } vec->K <<= 1; vec->lgK++; } /* ============================================================================ main array multiplication routine ============================================================================ */ /* The idea of this routine is as follows. We simulate the algorithm used in zn_array_mul_fft, in the case that the FFTs are performed via pmfvec_fft_huge() and pmfvec_ifft_huge(), with K factored into T = 2^lgT rows and U = 2^lgU columns. Here we assume that T is quite small and U is possibly very large. However, instead of storing the whole fourier transform, we only work on a single row at a time. This means we have to store at most three rows simultaneously: the two rows whose transforms are being multiplied together, and the "partial" row, which enters the computation right at the beginning and is needed until the end. (In the ordinary mul_fft routine, we need space for 2T rows simultaneously.) This also means we cannot use *fast* fourier transforms for the columns, since we don't have all the data available. They are done by a naive DFT instead. The total number of coefficient operations (adds/subs) is O(T * U * log(U) + T^2 * U). */ void zn_array_mul_fft_dft (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, unsigned lgT, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); if (lgT == 0) { // no layers of DFT; just call usual FFT routine int sqr = (op1 == op2) && (n1 == n2); ulong x = zn_array_mul_fft_fudge (n1, n2, sqr, mod); zn_array_mul_fft (res, op1, n1, op2, n2, x, mod); return; } unsigned lgM, lgK; // number of pmf_t coefficients for each input poly ulong m1, m2; // figure out how big the transform needs to be mul_fft_params (&lgK, &lgM, &m1, &m2, n1, n2); // number of pmf_t coefficients for output poly ulong m = m1 + m2 - 1; ulong M = 1UL << lgM; ulong K = 1UL << lgK; ptrdiff_t skip = M + 1; size_t n3 = n1 + n2 - 1; // Split up transform into length K = U * T, i.e. U columns and T rows. if (lgT >= lgK) lgT = lgK; unsigned lgU = lgK - lgT; ulong U = 1UL << lgU; ulong T = 1UL << lgT; // space for two input rows, and one partial row pmfvec_t in1, in2, part; pmfvec_init (in1, lgU, skip, lgM, mod); pmfvec_init (in2, lgU, skip, lgM, mod); pmfvec_init (part, lgU, skip, lgM, mod); // the virtual pmfvec_t that we use for the column DFTs virtual_pmfvec_t col; virtual_pmfvec_init (col, lgT, lgM, mod); // zero the output zn_array_zero (res, n3); long i, j, k; int which; // Write m = U * mT + mU, where 0 <= mU < U ulong mU = m & (U - 1); ulong mT = m >> lgU; // for each row (beginning with the last partial row if it exists).... for (i = mT - (mU == 0); i >= 0; i--) { ulong i_rev = bit_reverse (i, lgT); // for each input array.... for (which = 0; which < 2; which++) { pmfvec_struct* in = which ? in2 : in1; const ulong* op = which ? op2 : op1; size_t n = which ? n2 : n1; pmf_t p = in->data; for (j = 0; j < U; j++, p += in->skip) { // compute the i-th row of the j-th column as it would look after // the column FFTs, using naive DFT pmf_zero (p, M); ulong r = i_rev << (lgM - lgT + 1); for (k = 0; k < T; k++) { merge_chunk_to_pmf (p, op, n, (k * U + j) << (lgM - 1), M, mod); pmf_rotate (p, -r); } pmf_rotate (p, (i_rev * j) << (lgM - lgK + 1)); } // Now we've got the whole row; run FFT on the row pmfvec_fft (in, (i == mT) ? mU : U, U, 0); } if (i == mT) { // pointwise multiply the two partial rows pmfvec_mul (part, in1, in2, mU, i == 0); // remove fudge factor pmfvec_scalar_mul (part, mU, pmfvec_mul_fudge (lgM, 0, mod)); // zero remainder of the partial row; we will subsequently add // in contributions from the vertical IFFTs when we process the other // rows. for (j = mU; j < U; j++) pmf_zero (part->data + part->skip * j, M); } else { // pointwise multiply the two rows pmfvec_mul (in1, in1, in2, U, i == 0); // remove fudge factor pmfvec_scalar_mul (in1, U, pmfvec_mul_fudge (lgM, 0, mod)); // horizontal IFFT this row pmfvec_ifft (in1, U, 0, U, 0); // simulate vertical IFFTs with DFTs for (j = 0; j < U; j++) { virtual_pmfvec_reset (col); virtual_pmf_import (col->data[i], in1->data + in1->skip * j); virtual_pmfvec_ifft (col, mT + (j < mU), (j >= mU) && mU, j << (lgM + 1 - lgK)); if ((j >= mU) && mU) { // add contribution to partial row (only for rightmost columns) pmf_t src = virtual_pmf_export (col->data[mT]); if (src) pmf_add (part->data + part->skip * j, src, M, mod); } // add contributions to output for (k = 0; k < mT + (j < mU); k++) merge_chunk_from_pmf (res, n3, virtual_pmf_export (col->data[k]), (k * U + j) * M/2, M, mod); } } } // now finish off the partial row if (mU) { // horizontal IFFT partial row pmfvec_ifft (part, mU, 0, U, 0); // simulate leftmost vertical IFFTs for (j = 0; j < mU; j++) { virtual_pmfvec_reset (col); virtual_pmf_import (col->data[mT], part->data + part->skip * j); virtual_pmfvec_ifft (col, mT + 1, 0, j << (lgM + 1 - lgK)); // add contributions to output for (k = 0; k <= mT; k++) merge_chunk_from_pmf (res, n3, virtual_pmf_export (col->data[k]), (k * U + j) * M/2, M, mod); } } // normalise result zn_array_scalar_mul (res, res, n3, zn_mod_pow2 (-lgK, mod), mod); virtual_pmfvec_clear (col); pmfvec_clear (part); pmfvec_clear (in2); pmfvec_clear (in1); } // end of file **************************************************************** zn_poly-0.9.2/src/mul_ks.c000066400000000000000000000516771360464557000154730ustar00rootroot00000000000000/* mul_ks.c: polynomial multiplication by Kronecker substitution Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" /* In the routines below, we denote by f1(x) and f2(x) the input polynomials op1[0, n1) and op2[0, n2), and by h(x) their product in Z[x]. */ /* Multiplication/squaring using Kronecker substitution at 2^b. */ void zn_array_mul_KS1 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n1 <= ULONG_MAX); ZNP_ASSERT ((mod->m & 1) || !redc); int sqr = (op1 == op2 && n1 == n2); // length of h size_t n3 = n1 + n2 - 1; // bits in each output coefficient unsigned b = 2 * mod->bits + ceil_lg (n2); // number of ulongs required to store each output coefficient unsigned w = CEIL_DIV (b, ULONG_BITS); ZNP_ASSERT (w <= 3); // number of limbs needed to store f1(2^b) and f2(2^b) size_t k1 = CEIL_DIV (n1 * b, GMP_NUMB_BITS); size_t k2 = CEIL_DIV (n2 * b, GMP_NUMB_BITS); // allocate space ZNP_FASTALLOC (limbs, mp_limb_t, 6624, 2 * (k1 + k2)); mp_limb_t* v1 = limbs; // k1 limbs mp_limb_t* v2 = v1 + k1; // k2 limbs mp_limb_t* v3 = v2 + k2; // k1 + k2 limbs if (!sqr) { // multiplication version // evaluate f1(2^b) and f2(2^b) zn_array_pack (v1, op1, n1, 1, b, 0, 0); zn_array_pack (v2, op2, n2, 1, b, 0, 0); // compute h(2^b) = f1(2^b) * f2(2^b) ZNP_mpn_mul (v3, v1, k1, v2, k2); } else { // squaring version // evaluate f1(2^b) zn_array_pack (v1, op1, n1, 1, b, 0, 0); // compute h(2^b) = f1(2^b)^2 ZNP_mpn_mul (v3, v1, k1, v1, k1); } // unpack coefficients of h, and reduce mod m ZNP_FASTALLOC (z, ulong, 6624, n3 * w); zn_array_unpack_SAFE (z, v3, n3, b, 0, k1 + k2); array_reduce (res, 1, z, n3, w, redc, mod); ZNP_FASTFREE (z); ZNP_FASTFREE (limbs); } /* Multiplication/squaring using Kronecker substitution at 2^b and -2^b. */ void zn_array_mul_KS2 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n1 <= ULONG_MAX); ZNP_ASSERT ((mod->m & 1) || !redc); if (n2 == 1) { // code below needs n2 > 1, so fall back on scalar multiplication _zn_array_scalar_mul (res, op1, n1, op2[0], redc, mod); return; } int sqr = (op1 == op2 && n1 == n2); // bits in each output coefficient unsigned bits = 2 * mod->bits + ceil_lg (n2); // we're evaluating at x = B and -B, where B = 2^b, and b = ceil(bits / 2) unsigned b = (bits + 1) / 2; // number of ulongs required to store each output coefficient unsigned w = CEIL_DIV (2 * b, ULONG_BITS); ZNP_ASSERT (w <= 3); // Write f1(x) = f1e(x^2) + x * f1o(x^2) // f2(x) = f2e(x^2) + x * f2o(x^2) // h(x) = he(x^2) + x * ho(x^2) // "e" = even, "o" = odd size_t n1o = n1 / 2; size_t n1e = n1 - n1o; size_t n2o = n2 / 2; size_t n2e = n2 - n2o; size_t n3 = n1 + n2 - 1; // length of h size_t n3o = n3 / 2; size_t n3e = n3 - n3o; // f1(B) and |f1(-B)| are at most ((n1 - 1) * b + mod->bits) bits long. // However, when evaluating f1e(B^2) and B * f1o(B^2) the bitpacking // routine needs room for the last chunk of 2b bits. Therefore we need to // allow room for (n1 + 1) * b bits. Ditto for f2. size_t k1 = CEIL_DIV ((n1 + 1) * b, GMP_NUMB_BITS); size_t k2 = CEIL_DIV ((n2 + 1) * b, GMP_NUMB_BITS); size_t k3 = k1 + k2; // allocate space ZNP_FASTALLOC (limbs, mp_limb_t, 6624, 3 * k3); mp_limb_t* v1_buf0 = limbs; // k1 limbs mp_limb_t* v2_buf0 = v1_buf0 + k1; // k2 limbs mp_limb_t* v1_buf1 = v2_buf0 + k2; // k1 limbs mp_limb_t* v2_buf1 = v1_buf1 + k1; // k2 limbs mp_limb_t* v1_buf2 = v2_buf1 + k2; // k1 limbs mp_limb_t* v2_buf2 = v1_buf2 + k1; // k2 limbs // arrange overlapping buffers to minimise memory use // "p" = plus, "m" = minus mp_limb_t* v1e = v1_buf0; mp_limb_t* v2e = v2_buf0; mp_limb_t* v1o = v1_buf1; mp_limb_t* v2o = v2_buf1; mp_limb_t* v1p = v1_buf2; mp_limb_t* v2p = v2_buf2; mp_limb_t* v1m = v1_buf0; mp_limb_t* v2m = v2_buf0; mp_limb_t* v3m = v1_buf1; mp_limb_t* v3p = v1_buf0; mp_limb_t* v3e = v1_buf2; mp_limb_t* v3o = v1_buf0; ZNP_FASTALLOC (z, ulong, 6624, w * n3e); int v3m_neg; if (!sqr) { // multiplication version // evaluate f1e(B^2) and B * f1o(B^2) zn_array_pack (v1e, op1, n1e, 2, 2 * b, 0, k1); zn_array_pack (v1o, op1 + 1, n1o, 2, 2 * b, b, k1); // evaluate f2e(B^2) and B * f2o(B^2) zn_array_pack (v2e, op2, n2e, 2, 2 * b, 0, k2); zn_array_pack (v2o, op2 + 1, n2o, 2, 2 * b, b, k2); // compute f1(B) = f1e(B^2) + B * f1o(B^2) // and f2(B) = f2e(B^2) + B * f2o(B^2) ZNP_ASSERT_NOCARRY (mpn_add_n (v1p, v1e, v1o, k1)); ZNP_ASSERT_NOCARRY (mpn_add_n (v2p, v2e, v2o, k2)); // compute |f1(-B)| = |f1e(B^2) - B * f1o(B^2)| // and |f2(-B)| = |f2e(B^2) - B * f2o(B^2)| v3m_neg = signed_mpn_sub_n (v1m, v1e, v1o, k1); v3m_neg ^= signed_mpn_sub_n (v2m, v2e, v2o, k2); // compute h(B) = f1(B) * f2(B) // compute |h(-B)| = |f1(-B)| * |f2(-B)| // v3m_neg is set if h(-B) is negative ZNP_mpn_mul (v3m, v1m, k1, v2m, k2); ZNP_mpn_mul (v3p, v1p, k1, v2p, k2); } else { // squaring version // evaluate f1e(B^2) and B * f1o(B^2) zn_array_pack (v1e, op1, n1e, 2, 2 * b, 0, k1); zn_array_pack (v1o, op1 + 1, n1o, 2, 2 * b, b, k1); // compute f1(B) = f1e(B^2) + B * f1o(B^2) ZNP_ASSERT_NOCARRY (mpn_add_n (v1p, v1e, v1o, k1)); // compute |f1(-B)| = |f1e(B^2) - B * f1o(B^2)| signed_mpn_sub_n (v1m, v1e, v1o, k1); // compute h(B) = f1(B)^2 // compute h(-B) = f1(-B)^2 // v3m_neg is cleared (since f1(-B)^2 is never negative) ZNP_mpn_mul (v3m, v1m, k1, v1m, k1); ZNP_mpn_mul (v3p, v1p, k1, v1p, k1); v3m_neg = 0; } // he(B^2) and B * ho(B^2) are both at most b * (n3 + 1) bits long (since // the coefficients don't overlap). The buffers used below are at least // b * (n1 + n2 + 2) = b * (n3 + 3) bits long. So we definitely have // enough room for 2 * he(B^2) and 2 * B * ho(B^2). // compute 2 * he(B^2) = h(B) + h(-B) ZNP_ASSERT_NOCARRY (v3m_neg ? mpn_sub_n (v3e, v3p, v3m, k3) : mpn_add_n (v3e, v3p, v3m, k3)); // unpack coefficients of he, and reduce mod m zn_array_unpack_SAFE (z, v3e, n3e, 2 * b, 1, k3); array_reduce (res, 2, z, n3e, w, redc, mod); // compute 2 * b * ho(B^2) = h(B) - h(-B) ZNP_ASSERT_NOCARRY (v3m_neg ? mpn_add_n (v3o, v3p, v3m, k3) : mpn_sub_n (v3o, v3p, v3m, k3)); // unpack coefficients of ho, and reduce mod m zn_array_unpack_SAFE (z, v3o, n3o, 2 * b, b + 1, k3); array_reduce (res + 1, 2, z, n3o, w, redc, mod); ZNP_FASTFREE (z); ZNP_FASTFREE (limbs); } /* Multiplication/squaring using Kronecker substitution at 2^b and 2^(-b). Note: this routine does not appear to be competitive in practice with the other KS routines. It's here just for fun. */ void zn_array_mul_KS3 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n1 <= ULONG_MAX); ZNP_ASSERT ((mod->m & 1) || !redc); int sqr = (op1 == op2 && n1 == n2); // length of h size_t n3 = n1 + n2 - 1; // bits in each output coefficient unsigned bits = 2 * mod->bits + ceil_lg (n2); // we're evaluating at x = B and 1/B, where B = 2^b, and b = ceil(bits / 2) unsigned b = (bits + 1) / 2; // number of ulongs required to store each base-B digit unsigned w = CEIL_DIV (b, ULONG_BITS); ZNP_ASSERT (w <= 2); // limbs needed to store f1(B) and B^(n1-1) * f1(1/B), ditto for f2 size_t k1 = CEIL_DIV (n1 * b, GMP_NUMB_BITS); size_t k2 = CEIL_DIV (n2 * b, GMP_NUMB_BITS); // allocate space ZNP_FASTALLOC (limbs, mp_limb_t, 6624, 2 * (k1 + k2)); mp_limb_t* v1 = limbs; // k1 limbs mp_limb_t* v2 = v1 + k1; // k2 limbs mp_limb_t* v3 = v2 + k2; // k1 + k2 limbs ZNP_FASTALLOC (z, ulong, 6624, 2 * w * (n3 + 1)); // "n" = normal order, "r" = reciprocal order ulong* zn = z; ulong* zr = z + w * (n3 + 1); if (!sqr) { // multiplication version // evaluate f1(B) and f2(B) zn_array_pack (v1, op1, n1, 1, b, 0, k1); zn_array_pack (v2, op2, n2, 1, b, 0, k2); // compute h(B) = f1(B) * f2(B) ZNP_mpn_mul (v3, v1, k1, v2, k2); } else { // squaring version // evaluate f1(B) zn_array_pack (v1, op1, n1, 1, b, 0, k1); // compute h(B) = f1(B)^2 ZNP_mpn_mul (v3, v1, k1, v1, k1); } // decompose h(B) into base-B digits zn_array_unpack_SAFE (zn, v3, n3 + 1, b, 0, k1 + k2); if (!sqr) { // multiplication version // evaluate B^(n1-1) * f1(1/B) and B^(n2-1) * f2(1/B) zn_array_pack (v1, op1 + n1 - 1, n1, -1, b, 0, k1); zn_array_pack (v2, op2 + n2 - 1, n2, -1, b, 0, k2); // compute B^(n1+n2-2) * h(1/B) = // (B^(n1-1) * f1(1/B)) * (B^(n2-1) * f2(1/B)) ZNP_mpn_mul (v3, v1, k1, v2, k2); } else { // squaring version // evaluate B^(n1-1) * f1(1/B) zn_array_pack (v1, op1 + n1 - 1, n1, -1, b, 0, k1); // compute B^(2*n1-2) * h(1/B) = (B^(n1-1) * f1(1/B))^2 ZNP_mpn_mul (v3, v1, k1, v1, k1); } // decompose h(1/B) into base-B digits zn_array_unpack_SAFE (zr, v3, n3 + 1, b, 0, k1 + k2); // recover h(x) from h(B) and h(1/B) // (note: need to check that the high digit of each output coefficient // is < B - 1; this follows from an estimate in section 3.2 of [Har07].) zn_array_recover_reduce (res, 1, zn, zr, n3, b, redc, mod); ZNP_FASTFREE(z); ZNP_FASTFREE(limbs); } /* Multiplication/squaring using Kronecker substitution at 2^b, -2^b, 2^(-b) and -2^(-b). */ void zn_array_mul_KS4 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n1 <= ULONG_MAX); ZNP_ASSERT ((mod->m & 1) || !redc); if (n2 == 1) { // code below needs n2 > 1, so fall back on scalar multiplication _zn_array_scalar_mul (res, op1, n1, op2[0], redc, mod); return; } int sqr = (op1 == op2 && n1 == n2); // bits in each output coefficient unsigned bits = 2 * mod->bits + ceil_lg (n2); // we're evaluating at x = B, -B, 1/B, -1/B, // where B = 2^b, and b = ceil(bits / 4) unsigned b = (bits + 3) / 4; // number of ulongs required to store each base-B^2 digit unsigned w = CEIL_DIV (2 * b, ULONG_BITS); ZNP_ASSERT (w <= 2); // Write f1(x) = f1e(x^2) + x * f1o(x^2) // f2(x) = f2e(x^2) + x * f2o(x^2) // h(x) = he(x^2) + x * ho(x^2) // "e" = even, "o" = odd size_t n1o = n1 / 2; size_t n1e = n1 - n1o; size_t n2o = n2 / 2; size_t n2e = n2 - n2o; size_t n3 = n1 + n2 - 1; // length of h size_t n3o = n3 / 2; size_t n3e = n3 - n3o; // Put k1 = number of limbs needed to store f1(B) and |f1(-B)|. // In f1(B), the leading coefficient starts at bit position b * (n1 - 1) // and has length 2b, and the coefficients overlap so we need an extra bit // for the carry: this gives (n1 + 1) * b + 1 bits. Ditto for f2. size_t k1 = CEIL_DIV ((n1 + 1) * b + 1, GMP_NUMB_BITS); size_t k2 = CEIL_DIV ((n2 + 1) * b + 1, GMP_NUMB_BITS); size_t k3 = k1 + k2; // allocate space ZNP_FASTALLOC (limbs, mp_limb_t, 6624, 5 * k3); mp_limb_t* v1_buf0 = limbs; // k1 limbs mp_limb_t* v2_buf0 = v1_buf0 + k1; // k2 limbs mp_limb_t* v1_buf1 = v2_buf0 + k2; // k1 limbs mp_limb_t* v2_buf1 = v1_buf1 + k1; // k2 limbs mp_limb_t* v1_buf2 = v2_buf1 + k2; // k1 limbs mp_limb_t* v2_buf2 = v1_buf2 + k1; // k2 limbs mp_limb_t* v1_buf3 = v2_buf2 + k2; // k1 limbs mp_limb_t* v2_buf3 = v1_buf3 + k1; // k2 limbs mp_limb_t* v1_buf4 = v2_buf3 + k2; // k1 limbs mp_limb_t* v2_buf4 = v1_buf4 + k1; // k2 limbs // arrange overlapping buffers to minimise memory use // "p" = plus, "m" = minus // "n" = normal order, "r" = reciprocal order mp_limb_t* v1en = v1_buf0; mp_limb_t* v1on = v1_buf1; mp_limb_t* v1pn = v1_buf2; mp_limb_t* v1mn = v1_buf0; mp_limb_t* v2en = v2_buf0; mp_limb_t* v2on = v2_buf1; mp_limb_t* v2pn = v2_buf2; mp_limb_t* v2mn = v2_buf0; mp_limb_t* v3pn = v1_buf1; mp_limb_t* v3mn = v1_buf2; mp_limb_t* v3en = v1_buf0; mp_limb_t* v3on = v1_buf1; mp_limb_t* v1er = v1_buf2; mp_limb_t* v1or = v1_buf3; mp_limb_t* v1pr = v1_buf4; mp_limb_t* v1mr = v1_buf2; mp_limb_t* v2er = v2_buf2; mp_limb_t* v2or = v2_buf3; mp_limb_t* v2pr = v2_buf4; mp_limb_t* v2mr = v2_buf2; mp_limb_t* v3pr = v1_buf3; mp_limb_t* v3mr = v1_buf4; mp_limb_t* v3er = v1_buf2; mp_limb_t* v3or = v1_buf3; ZNP_FASTALLOC (z, ulong, 6624, 2 * w * (n3e + 1)); ulong* zn = z; ulong* zr = z + w * (n3e + 1); int v3m_neg; // ------------------------------------------------------------------------- // "normal" evaluation points if (!sqr) { // multiplication version // evaluate f1e(B^2) and B * f1o(B^2) // We need max(2 * b*n1e, 2 * b*n1o + b) bits for this packing step, // which is safe since (n1 + 1) * b + 1 >= max(2 * b*n1e, 2 * b*n1o + b). // Ditto for f2 below. zn_array_pack (v1en, op1, n1e, 2, 2 * b, 0, k1); zn_array_pack (v1on, op1 + 1, n1o, 2, 2 * b, b, k1); // compute f1(B) = f1e(B^2) + B * f1o(B^2) // and |f1(-B)| = |f1e(B^2) - B * f1o(B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v1pn, v1en, v1on, k1)); v3m_neg = signed_mpn_sub_n (v1mn, v1en, v1on, k1); // evaluate f2e(B^2) and B * f2o(B^2) zn_array_pack (v2en, op2, n2e, 2, 2 * b, 0, k2); zn_array_pack (v2on, op2 + 1, n2o, 2, 2 * b, b, k2); // compute f2(B) = f2e(B^2) + B * f2o(B^2) // and |f2(-B)| = |f2e(B^2) - B * f2o(B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v2pn, v2en, v2on, k2)); v3m_neg ^= signed_mpn_sub_n (v2mn, v2en, v2on, k2); // compute h(B) = f1(B) * f2(B) // and |h(-B)| = |f1(-B)| * |f2(-B)| // hn_neg is set if h(-B) is negative ZNP_mpn_mul (v3pn, v1pn, k1, v2pn, k2); ZNP_mpn_mul (v3mn, v1mn, k1, v2mn, k2); } else { // squaring version // evaluate f1e(B^2) and B * f1o(B^2) zn_array_pack (v1en, op1, n1e, 2, 2 * b, 0, k1); zn_array_pack (v1on, op1 + 1, n1o, 2, 2 * b, b, k1); // compute f1(B) = f1e(B^2) + B * f1o(B^2) // and |f1(-B)| = |f1e(B^2) - B * f1o(B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v1pn, v1en, v1on, k1)); signed_mpn_sub_n (v1mn, v1en, v1on, k1); // compute h(B) = f1(B)^2 // and h(-B) = |f1(-B)|^2 // hn_neg is cleared since h(-B) is never negative ZNP_mpn_mul (v3pn, v1pn, k1, v1pn, k1); ZNP_mpn_mul (v3mn, v1mn, k1, v1mn, k1); v3m_neg = 0; } // Each coefficient of h(B) is up to 4b bits long, so h(B) needs at most // ((n1 + n2 + 2) * b + 1) bits. (The extra +1 is to accommodate carries // generated by overlapping coefficients.) The buffer has at least // ((n1 + n2 + 2) * b + 2) bits. Therefore we can safely store 2*h(B) etc. // compute 2 * he(B^2) = h(B) + h(-B) // and B * 2 * ho(B^2) = h(B) - h(-B) if (v3m_neg) { ZNP_ASSERT_NOCARRY (mpn_sub_n (v3en, v3pn, v3mn, k3)); ZNP_ASSERT_NOCARRY (mpn_add_n (v3on, v3pn, v3mn, k3)); } else { ZNP_ASSERT_NOCARRY (mpn_add_n (v3en, v3pn, v3mn, k3)); ZNP_ASSERT_NOCARRY (mpn_sub_n (v3on, v3pn, v3mn, k3)); } // ------------------------------------------------------------------------- // "reciprocal" evaluation points // correction factors to take into account that if a polynomial has even // length, its even and odd coefficients are swapped when the polynomial // is reversed unsigned a1 = (n1 & 1) ? 0 : b; unsigned a2 = (n2 & 1) ? 0 : b; unsigned a3 = (n3 & 1) ? 0 : b; if (!sqr) { // multiplication version // evaluate B^(n1-1) * f1e(1/B^2) and B^(n1-2) * f1o(1/B^2) zn_array_pack (v1er, op1 + 2*(n1e - 1), n1e, -2, 2 * b, a1, k1); zn_array_pack (v1or, op1 + 1 + 2*(n1o - 1), n1o, -2, 2 * b, b - a1, k1); // compute B^(n1-1) * f1(1/B) = // B^(n1-1) * f1e(1/B^2) + B^(n1-2) * f1o(1/B^2) // and |B^(n1-1) * f1(-1/B)| = // |B^(n1-1) * f1e(1/B^2) - B^(n1-2) * f1o(1/B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v1pr, v1er, v1or, k1)); v3m_neg = signed_mpn_sub_n (v1mr, v1er, v1or, k1); // evaluate B^(n2-1) * f2e(1/B^2) and B^(n2-2) * f2o(1/B^2) zn_array_pack (v2er, op2 + 2*(n2e - 1), n2e, -2, 2 * b, a2, k2); zn_array_pack (v2or, op2 + 1 + 2*(n2o - 1), n2o, -2, 2 * b, b - a2, k2); // compute B^(n2-1) * f2(1/B) = // B^(n2-1) * f2e(1/B^2) + B^(n2-2) * f2o(1/B^2) // and |B^(n1-1) * f2(-1/B)| = // |B^(n2-1) * f2e(1/B^2) - B^(n2-2) * f2o(1/B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v2pr, v2er, v2or, k2)); v3m_neg ^= signed_mpn_sub_n (v2mr, v2er, v2or, k2); // compute B^(n3-1) * h(1/B) = // (B^(n1-1) * f1(1/B)) * (B^(n2-1) * f2(1/B)) // and |B^(n3-1) * h(-1/B)| = // |B^(n1-1) * f1(-1/B)| * |B^(n2-1) * f2(-1/B)| // hr_neg is set if h(-1/B) is negative ZNP_mpn_mul (v3pr, v1pr, k1, v2pr, k2); ZNP_mpn_mul (v3mr, v1mr, k1, v2mr, k2); } else { // squaring version // evaluate B^(n1-1) * f1e(1/B^2) and B^(n1-2) * f1o(1/B^2) zn_array_pack (v1er, op1 + 2*(n1e - 1), n1e, -2, 2 * b, a1, k1); zn_array_pack (v1or, op1 + 1 + 2*(n1o - 1), n1o, -2, 2 * b, b - a1, k1); // compute B^(n1-1) * f1(1/B) = // B^(n1-1) * f1e(1/B^2) + B^(n1-2) * f1o(1/B^2) // and |B^(n1-1) * f1(-1/B)| = // |B^(n1-1) * f1e(1/B^2) - B^(n1-2) * f1o(1/B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v1pr, v1er, v1or, k1)); signed_mpn_sub_n (v1mr, v1er, v1or, k1); // compute B^(n3-1) * h(1/B) = (B^(n1-1) * f1(1/B))^2 // and B^(n3-1) * h(-1/B) = |B^(n1-1) * f1(-1/B)|^2 // hr_neg is cleared since h(-1/B) is never negative ZNP_mpn_mul (v3pr, v1pr, k1, v1pr, k1); ZNP_mpn_mul (v3mr, v1mr, k1, v1mr, k1); v3m_neg = 0; } // compute 2 * B^(n3-1) * he(1/B^2) // = B^(n3-1) * h(1/B) + B^(n3-1) * h(-1/B) // and 2 * B^(n3-2) * ho(1/B^2) // = B^(n3-1) * h(1/B) - B^(n3-1) * h(-1/B) if (v3m_neg) { ZNP_ASSERT_NOCARRY (mpn_sub_n (v3er, v3pr, v3mr, k3)); ZNP_ASSERT_NOCARRY (mpn_add_n (v3or, v3pr, v3mr, k3)); } else { ZNP_ASSERT_NOCARRY (mpn_add_n (v3er, v3pr, v3mr, k3)); ZNP_ASSERT_NOCARRY (mpn_sub_n (v3or, v3pr, v3mr, k3)); } // ------------------------------------------------------------------------- // combine "normal" and "reciprocal" information // decompose he(B^2) and B^(2*(n3e-1)) * he(1/B^2) into base-B^2 digits zn_array_unpack_SAFE (zn, v3en, n3e + 1, 2 * b, 1, k3); zn_array_unpack_SAFE (zr, v3er, n3e + 1, 2 * b, a3 + 1, k3); // combine he(B^2) and he(1/B^2) information to get even coefficients of h zn_array_recover_reduce (res, 2, zn, zr, n3e, 2 * b, redc, mod); // decompose ho(B^2) and B^(2*(n3o-1)) * ho(1/B^2) into base-B^2 digits zn_array_unpack_SAFE (zn, v3on, n3o + 1, 2 * b, b + 1, k3); zn_array_unpack_SAFE (zr, v3or, n3o + 1, 2 * b, b - a3 + 1, k3); // combine ho(B^2) and ho(1/B^2) information to get odd coefficients of h zn_array_recover_reduce (res + 1, 2, zn, zr, n3o, 2 * b, redc, mod); ZNP_FASTFREE (z); ZNP_FASTFREE (limbs); } // end of file **************************************************************** zn_poly-0.9.2/src/mulmid.c000066400000000000000000000136561360464557000154630ustar00rootroot00000000000000/* mulmid.c: middle products Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" ulong zn_array_mulmid_fallback_fudge (size_t n1, size_t n2, const zn_mod_t mod) { return _zn_array_mul_fudge (n1, n2, 0, mod); } void zn_array_mulmid_fallback (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int fastred, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_FASTALLOC (temp, ulong, 6624, n1 + n2 - 1); // just do full product and extract relevant segment _zn_array_mul (temp, op1, n1, op2, n2, fastred, mod); zn_array_copy (res, temp + n2 - 1, n1 - n2 + 1); ZNP_FASTFREE (temp); } ulong _zn_array_mulmid_fudge (size_t n1, size_t n2, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); if (!(mod->m & 1)) // no fudge if the modulus is even. return 1; tuning_info_t* i = &tuning_info[mod->bits]; if (n2 < i->mulmid_KS2_thresh || n2 < i->mulmid_KS4_thresh || n2 < i->mulmid_fft_thresh) // fudge is -B return mod->m - mod->B; // return whatever fudge is used by the fft middle product code return zn_array_mulmid_fft_fudge (n1, n2, mod); } void _zn_array_mulmid (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int fastred, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); // we can use REDC reduction if the modulus is odd and the caller is happy // to receive the result with a fudge factor int odd = (mod->m & 1); int redc = fastred && odd; tuning_info_t* i = &tuning_info[mod->bits]; if (n2 < i->mulmid_KS2_thresh) zn_array_mulmid_KS1 (res, op1, n1, op2, n2, redc, mod); else if (n2 < i->mulmid_KS4_thresh) zn_array_mulmid_KS2 (res, op1, n1, op2, n2, redc, mod); else if (!odd || n2 < i->mulmid_fft_thresh) zn_array_mulmid_KS4 (res, op1, n1, op2, n2, redc, mod); else { ulong x = fastred ? 1 : zn_array_mulmid_fft_fudge (n1, n2, mod); zn_array_mulmid_fft (res, op1, n1, op2, n2, x, mod); } } void zn_array_mulmid (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, const zn_mod_t mod) { _zn_array_mulmid (res, op1, n1, op2, n2, 0, mod); } void zn_array_mulmid_precomp1_init (zn_array_mulmid_precomp1_t res, const ulong* op1, size_t n1, size_t n2, const zn_mod_t mod) { res->n1 = n1; res->n2 = n2; res->mod = mod; // figure out which algorithm to use int odd = (mod->m & 1); if (!odd) // can't use FFT algorithm when modulus is even res->algo = ZNP_MULMID_ALGO_KS; else { tuning_info_t* i = &tuning_info[mod->bits]; res->algo = (n2 < i->mulmid_fft_thresh) ? ZNP_MULMID_ALGO_KS : ZNP_MULMID_ALGO_FFT; } // now perform initialisation for chosen algorithm switch (res->algo) { case ZNP_MULMID_ALGO_KS: { // Make a copy of op1[0, n1). // If modulus is odd, multiply it by the appropriate fudge factor // so that we can use faster REDC reduction in the execute() routine. res->op1 = (ulong*) malloc (n1 * sizeof (ulong)); if (odd) zn_array_scalar_mul (res->op1, op1, n1, mod->m - mod->B, mod); else zn_array_copy (res->op1, op1, n1); } break; case ZNP_MULMID_ALGO_FFT: { res->precomp_fft = (struct zn_array_mulmid_fft_precomp1_struct*) malloc (sizeof (zn_array_mulmid_fft_precomp1_t)); // we do scaling in this init() routine, to avoid doing it during // each call to execute() ulong x = zn_array_mulmid_fft_precomp1_fudge (n1, n2, mod); zn_array_mulmid_fft_precomp1_init (res->precomp_fft, op1, n1, n2, x, mod); } break; default: ZNP_ASSERT (0); } } void zn_array_mulmid_precomp1_clear (zn_array_mulmid_precomp1_t op) { // dispatch to appropriate cleanup code switch (op->algo) { case ZNP_MULMID_ALGO_KS: free (op->op1); break; case ZNP_MULMID_ALGO_FFT: zn_array_mulmid_fft_precomp1_clear (op->precomp_fft); free (op->precomp_fft); break; default: ZNP_ASSERT (0); } } void zn_array_mulmid_precomp1_execute (ulong* res, const ulong* op2, const zn_array_mulmid_precomp1_t precomp) { // dispatch to appropriate middle product code switch (precomp->algo) { case ZNP_MULMID_ALGO_KS: _zn_array_mulmid (res, precomp->op1, precomp->n1, op2, precomp->n2, precomp->mod->m & 1, precomp->mod); break; case ZNP_MULMID_ALGO_FFT: zn_array_mulmid_fft_precomp1_execute (res, op2, 1, precomp->precomp_fft); break; default: ZNP_ASSERT (0); } } // end of file **************************************************************** zn_poly-0.9.2/src/mulmid_ks.c000066400000000000000000000724771360464557000161660ustar00rootroot00000000000000/* mulmid_ks.c: polynomial middle products by Kronecker substitution Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" /* In the routines below, we denote by f1(x) and f2(x) the input polynomials op1[0, n1) and op2[0, n2), and by h(x) their product in Z[x]. We write h(x) = LO(x) + x^(n2-1) * g(x) + x^n1 * HI(x), where len(LO) = len(HI) = n2 - 1 and len(g) = n1 - n2 + 1. Our goal is to compute the middle segment g. The basic strategy is: if X is an evaluation point (i.e. X = 2^b, -2^b, 2^(-b) or -2^(-b)) then g(X) corresponds roughly to the integer middle product of f1(X) and f2(X), and we will use mpn_mulmid() to compute the latter. Unfortunately there are some complications. First, mpn_mulmid() works in terms of whole limb counts, not bit counts, and moreover the first two and last two limbs of the output of mpn_mulmid() are always garbage. We handle this issue using zero-padding as follows. Suppose that we need s bits of g(X) starting at bit index r. We compute f2(X) as usual. Let k2 = number of limbs used to store f2(X). Instead of evaluating f1(X), we evaluate 2^p * f1(X), i.e. zero-pad by p bits, where p = (k2 + 1) * GMP_NUMB_BITS - r. (We will verify in each case that p >= 0.) This shifts g(X) left by p bits, and ensures that bit #r of g(X) starts exactly at the first bit of the third limb of the output of mpn_mulmid(). Let k1 = number of limbs used to store f1(X). To be guaranteed of obtaining s correct bits of g(X), we need to have (k1 - k2 - 1) * GMP_NUMB_BITS >= s, or equivalently k1 * GMP_NUMB_BITS >= p + r + s. (*) We zero-pad 2^p * f1(X) on the right to ensure that (*) holds. In every case, it turns out that the total amount of zero-padding is O(1) bits. Second, in the "reciprocal" variants (KS3 and KS4) there is the problem of overlapping coefficients, e.g. when we compute the integer middle product, the low bits of g(X) are polluted by the high bits of LO(X). To deal with this we need to compute the low coefficient of g(X) separately, and remove its effect from the overlapping portion. Similarly at the other end. The diagonal() function accomplishes this. */ /* Middle product using Kronecker substitution at 2^b. */ void zn_array_mulmid_KS1 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n1 <= ULONG_MAX); ZNP_ASSERT ((mod->m & 1) || !redc); // length of g size_t n3 = n1 - n2 + 1; // bits in each output coefficient unsigned b = 2 * mod->bits + ceil_lg (n2); // number of ulongs required to store each output coefficient unsigned w = CEIL_DIV (b, ULONG_BITS); ZNP_ASSERT (w <= 3); // number of limbs needed to store f2(2^b) size_t k2 = CEIL_DIV (n2 * b, GMP_NUMB_BITS); // We need r = (n2 - 1) * b and s = (n1 - n2 + 1) * b. Note that p is // non-negative since k2 * GMP_NUMB_BITS >= n2 * b. unsigned p = GMP_NUMB_BITS * (k2 + 1) - (n2 - 1) * b; // For (*) to hold we need k1 * GMP_NUMB_BITS >= p + n1 * b. size_t k1 = CEIL_DIV (p + n1 * b, GMP_NUMB_BITS); // allocate space ZNP_FASTALLOC (limbs, mp_limb_t, 6624, 2 * k1 + 3); mp_limb_t* v1 = limbs; // k1 limbs mp_limb_t* v2 = v1 + k1; // k2 limbs mp_limb_t* v3 = v2 + k2; // k1 - k2 + 3 limbs // evaluate 2^p * f1(2^b) and f2(2^b) zn_array_pack (v1, op1, n1, 1, b, p, 0); zn_array_pack (v2, op2, n2, 1, b, 0, 0); // compute segment of f1(2^b) * f2(2^b) starting at bit index r ZNP_mpn_mulmid (v3, v1, k1, v2, k2); // unpack coefficients of g, and reduce mod m ZNP_FASTALLOC (z, ulong, 6624, n3 * w); zn_array_unpack_SAFE (z, v3 + 2, n3, b, 0, k1 - k2 - 1); array_reduce (res, 1, z, n3, w, redc, mod); ZNP_FASTFREE (z); ZNP_FASTFREE (limbs); } /* Middle product using Kronecker substitution at 2^b and -2^b. */ void zn_array_mulmid_KS2 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n1 <= ULONG_MAX); ZNP_ASSERT ((mod->m & 1) || !redc); if (n2 == 1) { // code below needs n2 > 1, so fall back on scalar multiplication _zn_array_scalar_mul (res, op1, n1, op2[0], redc, mod); return; } // bits in each output coefficient unsigned bits = 2 * mod->bits + ceil_lg (n2); // we're evaluating at x = B and -B, where B = 2^b, and b = ceil(bits / 2) unsigned b = (bits + 1) / 2; // number of ulongs required to store each output coefficient unsigned w = CEIL_DIV (2 * b, ULONG_BITS); ZNP_ASSERT (w <= 3); // Write f1(x) = f1e(x^2) + x * f1o(x^2) // f2(x) = f2e(x^2) + x * f2o(x^2) // h(x) = he(x^2) + x * ho(x^2) // g(x) = ge(x^2) + x * go(x^2) // "e" = even, "o" = odd // When evaluating f2e(B^2) and B * f2o(B^2) the bit-packing routine needs // room for the last chunk of 2b bits, so we need to allow room for // (n2 + 1) * b bits. size_t k2 = CEIL_DIV ((n2 + 1) * b, GMP_NUMB_BITS); // We need r = (n2 - 2) * b + 1 and s = (n1 - n2 + 3) * b. // Note that p is non-negative (since k2 * GMP_NUMB_BITS >= (n2 + 1) * b // >= (n2 - 2) * b - 1). unsigned p = GMP_NUMB_BITS * (k2 + 1) - (n2 - 2) * b - 1; // For (*) to hold we need k1 * GMP_NUMB_BITS >= p + (n1 + 1) * b + 1. // Also, to ensure that there is enough room for bit-packing (as for k2 // above), we need k1 * GMP_NUMB_BITS >= p + (n1 + 1) * b; this condition // is subsumed by the first one. size_t k1 = CEIL_DIV (p + (n1 + 1) * b + 1, GMP_NUMB_BITS); size_t k3 = k1 - k2 + 3; ZNP_ASSERT (k3 >= 5); // allocate space ZNP_FASTALLOC (limbs, mp_limb_t, 6624, 3 * k3 + 5 * k2); mp_limb_t* v2_buf0 = limbs; // k2 limbs mp_limb_t* v3_buf0 = v2_buf0 + k2; // k3 limbs mp_limb_t* v2_buf1 = v3_buf0 + k3; // k2 limbs mp_limb_t* v3_buf1 = v2_buf1 + k2; // k3 limbs mp_limb_t* v2_buf2 = v3_buf1 + k3; // k2 limbs mp_limb_t* v3_buf2 = v2_buf2 + k2; // k3 limbs mp_limb_t* v2_buf3 = v3_buf2 + k3; // k2 limbs mp_limb_t* v2_buf4 = v2_buf3 + k2; // k2 limbs // arrange overlapping buffers to minimise memory use // "p" = plus, "m" = minus mp_limb_t* v1e = v2_buf0; mp_limb_t* v1o = v2_buf2; mp_limb_t* v1p = v2_buf1; mp_limb_t* v1m = v2_buf0; mp_limb_t* v2e = v2_buf2; mp_limb_t* v2o = v2_buf4; mp_limb_t* v2p = v2_buf3; mp_limb_t* v2m = v2_buf2; mp_limb_t* v3m = v3_buf2; mp_limb_t* v3p = v3_buf0; mp_limb_t* v3e = v3_buf1; mp_limb_t* v3o = v3_buf1; // length of g size_t n3 = n1 - n2 + 1; ZNP_FASTALLOC (z, ulong, 6624, w * ((n3 + 1) / 2)); // evaluate 2^p * f1e(B^2) and 2^p * B * f1o(B^2) zn_array_pack (v1e, op1, (n1 + 1) / 2, 2, 2 * b, p, k1); zn_array_pack (v1o, op1 + 1, n1 / 2, 2, 2 * b, p + b, k1); // compute 2^p * f1(B) = 2^p * (f1e(B^2) + B * f1o(B^2)) // and |2^p * f1(-B)| = |2^p * (f1e(B^2) - B * f1o(B^2))| // v3m_neg is set if f1(-B) is negative ZNP_ASSERT_NOCARRY (mpn_add_n (v1p, v1e, v1o, k1)); int v3m_neg = signed_mpn_sub_n (v1m, v1e, v1o, k1); // evaluate f2e(B^2) and B * f2o(B^2) zn_array_pack (v2e, op2, (n2 + 1) / 2, 2, 2 * b, 0, k2); zn_array_pack (v2o, op2 + 1, n2 / 2, 2, 2 * b, b, k2); // compute f2(B) = f2e(B^2) + B * f2o(B^2) // and |f2(-B)| = |f2e(B^2) - B * f2o(B^2)| // v3m_neg is set if f1(-B) and f2(-B) have opposite signs ZNP_ASSERT_NOCARRY (mpn_add_n (v2p, v2e, v2o, k2)); v3m_neg ^= signed_mpn_sub_n (v2m, v2e, v2o, k2); // compute segment starting at bit index r of // h(B) = f1(B) * f2(B) // and |h(-B)| = |f1(-B)| * |f2(-B)| // v3m_neg is set if h(-B) is negative ZNP_mpn_mulmid (v3m, v1m, k1, v2m, k2); ZNP_mpn_mulmid (v3p, v1p, k1, v2p, k2); // compute segment starting at bit index r of // 2 * he(B^2) = h(B) + h(-B) (if n2 is odd) // or 2 * B * ho(B^2) = h(B) - h(-B) (if n2 is even) // i.e. the segment of he(B^2) or B * ho(B^2) starting at bit index // r - 1 = (n2 - 2) * b. This encodes the coefficients of ge(x). // Note that when we do the addition (resp. subtraction) below, we might // miss a carry (resp. borrow) from the unknown previous limbs. We arrange // so that the answers are either correct or one too big, by adding 1 // appropriately. if (v3m_neg ^ (n2 & 1)) { mpn_add_n (v3e, v3p + 2, v3m + 2, k3 - 4); // miss carry? mpn_add_1 (v3e, v3e, k3 - 4, 1); } else mpn_sub_n (v3e, v3p + 2, v3m + 2, k3 - 4); // miss borrow? // Now we extract ge(x). The first coefficient we want is the coefficient // of x^(n2 - 1) in h(x); this starts at bit b index in v3e. We want // ceil(n3 / 2) coefficients altogether, with 2b bits each. This accounts // for the definition of s. // Claim: if we committed a "one-too-big" error above, this does not affect // the coefficients we extract. Proof: the first b bits of v3e are the top // half of the coefficient of x^(n2 - 2) in h(x). The base-B digit in those // b bits has value at most B - 2. Therefore adding 1 to it will never // overflow those b bits. zn_array_unpack_SAFE (z, v3e, (n3 + 1) / 2, 2 * b, b, k3 - 4); array_reduce (res, 2, z, (n3 + 1) / 2, w, redc, mod); // Now repeat all the above for go(x). if (v3m_neg ^ (n2 & 1)) mpn_sub_n (v3o, v3p + 2, v3m + 2, k3 - 4); else { mpn_add_n (v3o, v3p + 2, v3m + 2, k3 - 4); mpn_add_1 (v3o, v3o, k3 - 4, 1); } zn_array_unpack_SAFE (z, v3o, n3 / 2, 2 * b, 2 * b, k3 - 4); array_reduce (res + 1, 2, z, n3 / 2, w, redc, mod); ZNP_FASTFREE (z); ZNP_FASTFREE (limbs); } /* Computes the sum op1[0] * op2[n-1] + ... + op1[n-1] * op2[0] as an *integer*. The result is assumed to fit into w ulongs (where 1 <= w <= 3), and is written to res[0, w). The return value is the result reduced modulo mod->m (using redc if requested). */ #define diagonal_sum \ ZNP_diagonal_sum ulong diagonal_sum (ulong* res, const ulong* op1, const ulong* op2, size_t n, unsigned w, int redc, const zn_mod_t mod) { ZNP_ASSERT (n >= 1); ZNP_ASSERT (w >= 1); ZNP_ASSERT (w <= 3); size_t i; if (w == 1) { ulong sum = op1[0] * op2[n - 1]; for (i = 1; i < n; i++) sum += op1[i] * op2[n - 1 - i]; res[0] = sum; return redc ? zn_mod_reduce_redc (sum, mod) : zn_mod_reduce (sum, mod); } else if (w == 2) { ulong lo, hi, sum0, sum1; ZNP_MUL_WIDE (sum1, sum0, op1[0], op2[n - 1]); for (i = 1; i < n; i++) { ZNP_MUL_WIDE (hi, lo, op1[i], op2[n - 1 - i]); ZNP_ADD_WIDE (sum1, sum0, sum1, sum0, hi, lo); } res[0] = sum0; res[1] = sum1; return redc ? zn_mod_reduce2_redc (sum1, sum0, mod) : zn_mod_reduce2 (sum1, sum0, mod); } else // w == 3 { ulong lo, hi, sum0, sum1, sum2 = 0; ZNP_MUL_WIDE (sum1, sum0, op1[0], op2[n - 1]); for (i = 1; i < n; i++) { ZNP_MUL_WIDE (hi, lo, op1[i], op2[n - 1 - i]); ZNP_ADD_WIDE (sum1, sum0, sum1, sum0, hi, lo); // carry into third limb: if (sum1 <= hi) sum2 += (sum1 < hi || sum0 < lo); } res[0] = sum0; res[1] = sum1; res[2] = sum2; return redc ? zn_mod_reduce3_redc (sum2, sum1, sum0, mod) : zn_mod_reduce3 (sum2, sum1, sum0, mod); } } /* Inplace subtract 2^i*x from res[0, n). x is an array of w ulongs, where 1 <= w <= 3. i may be any non-negative integer. */ #define subtract_ulongs \ ZNP_subtract_ulongs void subtract_ulongs (mp_limb_t* res, size_t n, size_t i, ulong* x, unsigned w) { ZNP_ASSERT (w >= 1); ZNP_ASSERT (w <= 3); #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS size_t k = i / GMP_NUMB_BITS; if (k >= n) return; unsigned j = i % GMP_NUMB_BITS; if (j == 0) mpn_sub (res + k, res + k, n - k, (mp_srcptr) x, ZNP_MIN (n - k, w)); else { mp_limb_t y[4]; y[w] = mpn_lshift (y, (mp_srcptr) x, w, j); mpn_sub (res + k, res + k, n - k, y, ZNP_MIN (n - k, w + 1)); } #else #error Not nails-safe yet #endif } /* Middle product using Kronecker substitution at 2^b and 2^(-b). Note: this routine does not appear to be competitive in practice with the other KS routines. It's here just for fun. */ void zn_array_mulmid_KS3 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n1 <= ULONG_MAX); ZNP_ASSERT ((mod->m & 1) || !redc); // length of g size_t n3 = n1 - n2 + 1; // bits in each output coefficient unsigned bits = 2 * mod->bits + ceil_lg (n2); // we're evaluating at x = B and 1/B, where B = 2^b, and b = ceil(bits / 2) unsigned b = (bits + 1) / 2; // number of ulongs required to store each base-B digit unsigned w = CEIL_DIV (b, ULONG_BITS); ZNP_ASSERT (w <= 2); // number of ulongs needed to store each output coefficient unsigned ww = CEIL_DIV (2 * b, ULONG_BITS); ZNP_ASSERT (ww <= 3); // directly compute coefficient of x^0 in g(x) ulong dlo[3]; res[0] = diagonal_sum (dlo, op1, op2, n2, ww, redc, mod); if (n3 == 1) return; // only need one coefficient of output // directly compute coefficient of x^(n3-1) in g(x) ulong dhi[3]; res[n3 - 1] = diagonal_sum (dhi, op1 + n3 - 1, op2, n2, ww, redc, mod); if (n3 == 2) return; // only need two coefficients of output // limbs needed to store f2(B) and B^(n2-1) * f2(1/B) size_t k2 = CEIL_DIV (n2 * b, GMP_NUMB_BITS); // we need r = (n2 - 1) * b and s = (n1 - n2 + 1) * b, thus p is: unsigned p = GMP_NUMB_BITS * (k2 + 1) - (n2 - 1) * b; // for (*) we need k1 * GMP_NUMB_BITS >= p + n1 * b size_t k1 = CEIL_DIV (p + n1 * b, GMP_NUMB_BITS); size_t k3 = k1 - k2 + 3; ZNP_ASSERT (k3 >= 5); // allocate space ZNP_FASTALLOC (limbs, mp_limb_t, 6624, 2 * k1 + 3); mp_limb_t* v1 = limbs; // k1 limbs mp_limb_t* v2 = v1 + k1; // k2 limbs mp_limb_t* v3 = v2 + k2; // k1 - k2 + 3 limbs ZNP_FASTALLOC (z, ulong, 6624, 2 * w * n3); // "n" = normal order, "r" = reciprocal order ulong* zn = z; ulong* zr = z + w * n3; // ------------------------------------------------------------------------- // "normal" evaluation point // evaluate 2^p * f1(B) and f2(B) zn_array_pack (v1, op1, n1, 1, b, p, k1); zn_array_pack (v2, op2, n2, 1, b, 0, k2); // compute segment starting at bit index r of h(B) = f1(B) * f2(B) ZNP_mpn_mulmid (v3, v1, k1, v2, k2); // remove x^0 and x^(n3 - 1) coefficient of g(x) subtract_ulongs (v3 + 2, k3 - 4, 0, dlo, ww); subtract_ulongs (v3 + 2, k3 - 4, (n3 - 1) * b, dhi, ww); // decompose relevant portion of h(B) into base-B digits zn_array_unpack_SAFE (zn, v3 + 2, n3 - 1, b, b, k3 - 4); // At this stage zn contains (n3 - 1) base-B digits, representing the // integer g[1] + g[2]*B + ... + g[n3-2]*B^(n3-3) // ------------------------------------------------------------------------- // "reciprocal" evaluation point // evaluate 2^p * B^(n1-1) * f1(1/B) and B^(n2-1) * f2(B) zn_array_pack (v1, op1 + n1 - 1, n1, -1, b, p, k1); zn_array_pack (v2, op2 + n2 - 1, n2, -1, b, 0, k2); // compute segment starting at bit index r of B^(n1+n2-2) * h(1/B) = // (B^(n1-1) * f1(1/B)) * (B^(n2-1) * f2(1/B)) ZNP_mpn_mulmid (v3, v1, k1, v2, k2); // remove x^0 and x^(n3 - 1) coefficient of g(x) subtract_ulongs (v3 + 2, k3 - 4, 0, dhi, ww); subtract_ulongs (v3 + 2, k3 - 4, (n3 - 1) * b, dlo, ww); // decompose relevant portion of B^(n1+n2-2) * h(1/B) into base-B digits zn_array_unpack_SAFE (zr, v3 + 2, n3 - 1, b, b, k3 - 4); // At this stage zr contains (n3 - 1) base-B digits, representing the // integer g[n3-2] + g[n3-3]*B + ... + g[1]*B^(n3-3) // ------------------------------------------------------------------------- // combine "normal" and "reciprocal" information zn_array_recover_reduce (res + 1, 1, zn, zr, n3 - 2, b, redc, mod); ZNP_FASTFREE (z); ZNP_FASTFREE (limbs); } /* Middle product using Kronecker substitution at 2^b, -2^b, 2^(-b) and -2^(-b). */ void zn_array_mulmid_KS4 (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, int redc, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ZNP_ASSERT (n1 <= ULONG_MAX); ZNP_ASSERT ((mod->m & 1) || !redc); if (n2 == 1) { // code below needs n2 > 1, so fall back on scalar multiplication _zn_array_scalar_mul (res, op1, n1, op2[0], redc, mod); return; } // bits in each output coefficient unsigned bits = 2 * mod->bits + ceil_lg (n2); // we're evaluating at x = B, -B, 1/B, -1/B, // where B = 2^b, and b = ceil(bits / 4) unsigned b = (bits + 3) / 4; // number of ulongs required to store each base-B^2 digit unsigned w = CEIL_DIV (2 * b, ULONG_BITS); ZNP_ASSERT (w <= 2); // number of ulongs needed to store each output coefficient unsigned ww = CEIL_DIV (4 * b, ULONG_BITS); ZNP_ASSERT (ww <= 3); // mask = 2^c - 1, where c = number of bits used in high ulong of each // base-B^2 digit ulong mask; if (w == 1) mask = ((2 * b) < ULONG_BITS) ? ((1UL << (2 * b)) - 1) : (-1UL); else // w == 2 mask = (1UL << ((2 * b) - ULONG_BITS)) - 1; // Write f1(x) = f1e(x^2) + x * f1o(x^2) // f2(x) = f2e(x^2) + x * f2o(x^2) // h(x) = he(x^2) + x * ho(x^2) // g(x) = ge(x^2) + x * go(x^2) // "e" = even, "o" = odd size_t n1o = n1 / 2; size_t n1e = n1 - n1o; size_t n2o = n2 / 2; size_t n2e = n2 - n2o; size_t n3 = n1 - n2 + 1; // length of g size_t n3o = n3 / 2; size_t n3e = n3 - n3o; // directly compute coefficient of x^0 in ge(x) ulong delo[3]; res[0] = diagonal_sum (delo, op1, op2, n2, ww, redc, mod); if (n3 == 1) return; // only need one coefficient of output // directly compute coefficient of x^0 in go(x) ulong dolo[3]; res[1] = diagonal_sum (dolo, op1 + 1, op2, n2, ww, redc, mod); if (n3 == 2) return; // only need two coefficients of output // directly compute coefficient of x^(n3e - 1) in ge(x) ulong dehi[3]; res[2*n3e - 2] = diagonal_sum (dehi, op1 + 2*n3e - 2, op2, n2, ww, redc, mod); if (n3 == 3) return; // only need three coefficients of output // directly compute coefficient of x^(n3o - 1) in go(x) ulong dohi[3]; res[2*n3o - 1] = diagonal_sum (dohi, op1 + 2*n3o - 1, op2, n2, ww, redc, mod); if (n3 == 4) return; // only need four coefficients of output // In f2(B), the leading coefficient starts at bit position b * (n2 - 1) // and has length 2*b, and the coefficients overlap so we need an extra bit // for the carry: this gives (n2 + 1) * b + 1 bits. size_t k2 = CEIL_DIV ((n2 + 1) * b + 1, GMP_NUMB_BITS); // We need r = (n2 - 1) * b + 1 and s = (n3 + 1) * b. // Note that p is non-negative (since k2 * GMP_NUMB_BITS >= (n2 + 1) * b // >= (n2 - 1) * b - 1). unsigned p = GMP_NUMB_BITS * (k2 + 1) - (n2 - 1) * b - 1; // For (*) we need k1 * GMP_NUMB_BITS >= p + (n1 + 1) * b + 1. size_t k1 = CEIL_DIV (p + (n1 + 1) * b + 1, GMP_NUMB_BITS); size_t k3 = k1 - k2 + 3; ZNP_ASSERT (k3 >= 5); // allocate space ZNP_FASTALLOC (limbs, mp_limb_t, 6624, 5 * (k2 + k3)); mp_limb_t* v2_buf0 = limbs; // k2 limbs mp_limb_t* v3_buf0 = v2_buf0 + k2; // k3 limbs mp_limb_t* v2_buf1 = v3_buf0 + k3; // k2 limbs mp_limb_t* v3_buf1 = v2_buf1 + k2; // k3 limbs mp_limb_t* v2_buf2 = v3_buf1 + k3; // k2 limbs mp_limb_t* v3_buf2 = v2_buf2 + k2; // k3 limbs mp_limb_t* v2_buf3 = v3_buf2 + k3; // k2 limbs mp_limb_t* v3_buf3 = v2_buf3 + k2; // k3 limbs mp_limb_t* v2_buf4 = v3_buf3 + k3; // k2 limbs mp_limb_t* v3_buf4 = v2_buf4 + k2; // k3 limbs // arrange overlapping buffers to minimise memory use // "p" = plus, "m" = minus // "n" = normal order, "r" = reciprocal order mp_limb_t* v1en = v2_buf1; mp_limb_t* v1on = v2_buf2; mp_limb_t* v1pn = v2_buf0; mp_limb_t* v1mn = v2_buf1; mp_limb_t* v2en = v2_buf3; mp_limb_t* v2on = v2_buf4; mp_limb_t* v2pn = v2_buf2; mp_limb_t* v2mn = v2_buf3; mp_limb_t* v3mn = v3_buf2; mp_limb_t* v3pn = v3_buf3; mp_limb_t* v3en = v3_buf4; mp_limb_t* v3on = v3_buf3; mp_limb_t* v1er = v2_buf1; mp_limb_t* v1or = v2_buf2; mp_limb_t* v1pr = v2_buf0; mp_limb_t* v1mr = v2_buf1; mp_limb_t* v2er = v2_buf3; mp_limb_t* v2or = v2_buf4; mp_limb_t* v2pr = v2_buf2; mp_limb_t* v2mr = v2_buf3; mp_limb_t* v3mr = v3_buf2; mp_limb_t* v3pr = v3_buf1; mp_limb_t* v3er = v3_buf0; mp_limb_t* v3or = v3_buf1; ZNP_FASTALLOC (z, ulong, 6624, 2 * w * (n3e - 1)); ulong* zn = z; ulong* zr = z + w * (n3e - 1); int v3m_neg; // ------------------------------------------------------------------------- // "normal" evaluation point // evaluate 2^p * f1e(B^2) and 2^p * B * f1o(B^2) zn_array_pack (v1en, op1, n1e, 2, 2 * b, p, k1); zn_array_pack (v1on, op1 + 1, n1o, 2, 2 * b, p + b, k1); // compute 2^p * f1(B) = 2^p * f1e(B^2) + 2^p * B * f1o(B^2) // and 2^p * |f1(-B)| = |2^p * f1e(B^2) - 2^p * B * f1o(B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v1pn, v1en, v1on, k1)); v3m_neg = signed_mpn_sub_n (v1mn, v1en, v1on, k1); // evaluate f2e(B^2) and B * f2o(B^2) zn_array_pack (v2en, op2, n2e, 2, 2 * b, 0, k2); zn_array_pack (v2on, op2 + 1, n2o, 2, 2 * b, b, k2); // compute f2(B) = f2e(B^2) + B * f2o(B^2) // and |f2(-B)| = |f2e(B^2) - B * f2o(B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v2pn, v2en, v2on, k2)); v3m_neg ^= signed_mpn_sub_n (v2mn, v2en, v2on, k2); // compute segment starting at bit index r of // h(B) = f1(B) * f2(B) // and |h(-B)| = |f1(-B)| * |f2(-B)| // hn_neg is set if h(-B) is negative ZNP_mpn_mulmid (v3mn, v1mn, k1, v2mn, k2); ZNP_mpn_mulmid (v3pn, v1pn, k1, v2pn, k2); // compute segments starting at bit index r of // 2 * he(B^2) = h(B) + h(-B) // and 2 * B * ho(B^2) = h(B) - h(-B) // ie. the segments of he(B^2) and B * ho(B^2) starting at bit index r - 1. // If n2 is odd, the former encodes ge(x) and the latter encodes go(x). // Otherwise the situation is reversed. We write the results to v3en/v3on // accordingly. // Note that when we do the addition (resp. subtraction) below, we might // miss a carry (resp. borrow) from the unknown previous limbs. We arrange // so that the answers are either correct or one too big, by adding 1 // appropriately. if (v3m_neg ^ (n2 & 1)) { mpn_add_n (v3en + 2, v3pn + 2, v3mn + 2, k3 - 4); // miss carry? mpn_add_1 (v3en + 2, v3en + 2, k3 - 4, 1); mpn_sub_n (v3on + 2, v3pn + 2, v3mn + 2, k3 - 4); // miss borrow? } else { mpn_sub_n (v3en + 2, v3pn + 2, v3mn + 2, k3 - 4); mpn_add_n (v3on + 2, v3pn + 2, v3mn + 2, k3 - 4); mpn_add_1 (v3on + 2, v3on + 2, k3 - 4, 1); } // remove x^0 and x^(n3e - 1) coefficients of ge(x), // and x^0 and x^(n3o - 1) coefficients of go(x). subtract_ulongs (v3en + 2, k3 - 4, 0, delo, ww); subtract_ulongs (v3en + 2, k3 - 4, (2*n3e - 2) * b, dehi, ww); subtract_ulongs (v3on + 2, k3 - 4, b, dolo, ww); subtract_ulongs (v3on + 2, k3 - 4, (2*n3o - 1) * b, dohi, ww); // At this stage, the integer // g[2] + g[4]*B^2 + ... + g[2*n3e - 4]*B^(2*n3e - 6) // appears in v3en + 2, starting at bit index 2*b, occupying // (2 * n3e - 2) * b bits. The integer // g[3] + g[5]*B^2 + ... + g[2*n3o - 3]*B^(2*n3o - 6) // appears in v3on + 2, starting at bit index 3*b, occupying // (2 * n3o - 2) * b bits. // ------------------------------------------------------------------------- // "reciprocal" evaluation point // evaluate 2^p * B^(n1-1) * f1e(1/B^2) and 2^p * B^(n1-2) * f1o(B^2) zn_array_pack (v1er, op1 + 2*(n1e - 1), n1e, -2, 2 * b, p + ((n1 & 1) ? 0 : b), k1); zn_array_pack (v1or, op1 + 1 + 2*(n1o - 1), n1o, -2, 2 * b, p + ((n1 & 1) ? b : 0), k1); // compute 2^p * B^(n1-1) * f1(1/B) = // 2^p * B^(n1-1) * f1e(1/B^2) + 2^p * B^(n1-2) * f1o(B^2) // and |2^p * B^(n1-1) * f1(-1/B)| = // |2^p * B^(n1-1) * f1e(1/B^2) - 2^p * B^(n1-2) * f1o(B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v1pr, v1er, v1or, k1)); v3m_neg = signed_mpn_sub_n (v1mr, v1er, v1or, k1); // evaluate B^(n2-1) * f2e(1/B^2) and B^(n2-2) * f2o(B^2) zn_array_pack (v2er, op2 + 2*(n2e - 1), n2e, -2, 2 * b, (n2 & 1) ? 0 : b, k2); zn_array_pack (v2or, op2 + 1 + 2*(n2o - 1), n2o, -2, 2 * b, (n2 & 1) ? b : 0, k2); // compute B^(n2-1) * f2(1/B) = // B^(n2-1) * f2e(1/B^2) + B^(n2-2) * f2o(B^2) // and |B^(n2-1) * f2(-1/B)| = // |B^(n2-1) * f2e(1/B^2) - B^(n2-2) * f2o(B^2)| ZNP_ASSERT_NOCARRY (mpn_add_n (v2pr, v2er, v2or, k2)); v3m_neg ^= signed_mpn_sub_n (v2mr, v2er, v2or, k2); // compute segment starting at bit index r of // B^(n3-1) * h(1/B) = (B^(n1-1) * f1(1/B)) * (B^(n2-1) * f2(1/B)) // and |B^(n3-1) * h(-1/B)| = |B^(n1-1) * f1(-1/B)| * |B^(n2-1) * f2(-1/B)| // hr_neg is set if h(-1/B) is negative ZNP_mpn_mulmid (v3mr, v1mr, k1, v2mr, k2); ZNP_mpn_mulmid (v3pr, v1pr, k1, v2pr, k2); // compute segments starting at bit index r of // 2 * B^(n3-1) * he(1/B^2) = B^(n3-1) * h(1/B) + B^(n3-1) * h(-1/B) // and 2 * B^(n3-2) * ho(1/B^2) = B^(n3-1) * h(1/B) - B^(n3-1) * h(-1/B) // ie. the segments of B^(n3-1) * he(1/B^2) and B^(n3-2) * ho(1/B^2) // starting at bit index r - 1. // If n2 is odd, the former encodes ge(x) and the latter encodes go(x). // Otherwise the situation is reversed. We write the results to v3er/v3or // accordingly. if (v3m_neg ^ (n2 & 1)) { mpn_add_n (v3er + 2, v3pr + 2, v3mr + 2, k3 - 4); // miss carry? mpn_add_1 (v3er + 2, v3er + 2, k3 - 4, 1); mpn_sub_n (v3or + 2, v3pr + 2, v3mr + 2, k3 - 4); // miss borrow? } else { mpn_sub_n (v3er + 2, v3pr + 2, v3mr + 2, k3 - 4); mpn_add_n (v3or + 2, v3pr + 2, v3mr + 2, k3 - 4); mpn_add_1 (v3or + 2, v3or + 2, k3 - 4, 1); } unsigned s = (n3 & 1) ? 0 : b; // remove x^0 and x^(n3e - 1) coefficients of ge(x), // and x^0 and x^(n3o - 1) coefficients of go(x). subtract_ulongs (v3er + 2, k3 - 4, s, dehi, ww); subtract_ulongs (v3er + 2, k3 - 4, (2*n3e - 2) * b + s, delo, ww); subtract_ulongs (v3or + 2, k3 - 4, b - s, dohi, ww); subtract_ulongs (v3or + 2, k3 - 4, (2*n3o - 2) * b + b - s, dolo, ww); // At this stage, the integer // g[2*n3e - 4] + g[2*n3e - 6]*B^2 + ... + g[2]*B^(2*n3e - 6) // appears in v3er + 2, starting at bit index 2*b if n3 is odd, or 3*b // if n3 is even, and occupying (2 * n3e - 2) * b bits. The integer // g[2*n3o - 3] + g[2*n3o - 5]*B^2 + ... + g[3]*B^(2*n3o - 6) // appears in v3or + 2, starting at bit index 3*b if n3 is odd, or 2*b // if n3 is even, and occupying (2 * n3o - 2) * b bits. // ------------------------------------------------------------------------- // combine "normal" and "reciprocal" information // decompose relevant portion of ge(B^2) and ge(1/B^2) into base-B^2 digits zn_array_unpack_SAFE (zn, v3en + 2, n3e - 1, 2 * b, 2 * b, k3 - 4); zn_array_unpack_SAFE (zr, v3er + 2, n3e - 1, 2 * b, 2 * b + s, k3 - 4); // combine ge(B^2) and ge(1/B^2) information to get even coefficients of g zn_array_recover_reduce (res + 2, 2, zn, zr, n3e - 2, 2 * b, redc, mod); // decompose relevant portion of go(B^2) and go(1/B^2) into base-B^2 digits zn_array_unpack_SAFE (zn, v3on + 2, n3o - 1, 2 * b, 3 * b, k3 - 4); zn_array_unpack_SAFE (zr, v3or + 2, n3o - 1, 2 * b, 3 * b - s, k3 - 4); // combine go(B^2) and go(1/B^2) information to get odd coefficients of g zn_array_recover_reduce (res + 3, 2, zn, zr, n3o - 2, 2 * b, redc, mod); ZNP_FASTFREE (z); ZNP_FASTFREE (limbs); } // end of file **************************************************************** zn_poly-0.9.2/src/nuss.c000066400000000000000000000332331360464557000151550ustar00rootroot00000000000000/* nuss.c: negacyclic multiplications via Nussbaumer's algorithm Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ /* The main routine exported from this module is nuss_mul(). This takes two arrays of length L = 2^lgL and computes their negacyclic convolution, using Nussbaumer's convolution algorithm [Nus80]. These are used primarily for the pointwise multiplications in the Schonhage FFT (see mul_fft.c). It is optimised for *small* problems; we don't worry much about locality. For example, on my development machine, for 63-bit coefficients it beats KS multiplication at length 256; for 8-bit coefficients it wins at length 2048. The algorithm is as follows. Let R = Z/mZ. The input polynomials live in R[X]/(X^L + 1). We map the convolution problem to S[Z]/(Z^K - 1), where S = R[Y]/(Y^M + 1), M and K are powers of two, MK = 2L, K <= 2M, and M is minimal subject to these conditions (in other words M and K are around sqrt(L), and S has K-th roots of unity). An input polynomial \sum_{j=0}^{M-1} \sum_{i=0}^{K/2 - 1} a_{i + j*K/2} X^{i + j*K/2} is mapped to \sum_i \sum_j (a_{i + j*K/2} Y^j) Z^i. Note that this map involves *transposing* the input data, unlike the splitting step in the main Schonhage convolution routine. The nice thing about this map is that the original negacyclic property of R[X]/(X^L + 1) gets reflected neatly in S. The inverse map sends Z -> X, Y -> X^(K/2). */ #include "zn_poly_internal.h" /* The functions nuss_split() and nuss_fft() together perform the splitting and FFT stages of Nussbaumer's algorithm. The input array op has length M*K/2 (where M = res->M, K = res->K). In effect the two routines accomplish the following. The input is split into M chunks of length K/2, which are then transposed into the first K/2 coefficients of res. The last K/2 coefficients are set to zero. Then we compute the DFT: b_k = sum_{i=0}^{K-1} w^{ik} a_i, where w is the standard K-th root of unity (i.e. Y^(2M/K)). The FFT is inplace, and outputs are in bit-reversed order. The nuss_split() function actually incorporates the first two layers of the FFT (to avoid unnecessary operations on zero coefficients). The nuss_fft() function handles the remaining lgK - 2 layers. We require that 2M >= K >= 4. */ #define nuss_split \ ZNP_nuss_split void nuss_split (pmfvec_t res, const ulong* op) { ZNP_ASSERT (res->lgK >= 2); ZNP_ASSERT (res->lgM + 1 >= res->lgK); // Let b[0], ..., b[K/2 - 1] be the fourier coefficients obtained by // performing the split. (The coefficients b[K/2], ..., b[K - 1] are zero.) // After the first FFT pass, the coefficients would be // b[0], b[1], ..., b[K/2 - 1], // and then // b[0], w * b[1], ..., w^(K/2 - 1) * b[K/2 - 1]. // After the second FFT pass, the coefficients would be // (b[j] + b[j + K/4]) for 0 <= j < K/4, // w^(2j) (b[j] - b[j + K/4]) for 0 <= j < K/4, // w^j (b[j] + I * b[j + K/4]) for 0 <= j < K/4, // w^(3j) (b[j] - I * b[j + K/4]) for 0 <= j < K/4, // where I = w^(K/4) is the fourth root of unity. // We do all this in one pass, computing the four combinations // b[j] + b[j + K/4] // b[j] - b[j + K/4] // b[j] + I * b[j + K/4] // b[j] - I * b[j + K/4] // directly from the input, simultaneously with the transposition, and // then throw the w^j adjustments into the bias fields. ulong M = res->M, K = res->K; const zn_mod_struct* mod = res->mod; // dest points to j-th output coefficient ulong* dest = res->data + 1; // dest + half points to (j + K/4)-th output coefficient ptrdiff_t half = res->skip << (res->lgK - 2); // w = Y^r is the primitive K-th root of unity ulong r = M >> (res->lgK - 1); ulong i, j, s = 0; for (j = 0; j < K/4; j++, dest += res->skip, s += r) { const ulong* src = op + j; // apply twists dest[-1] = 0; dest[-1 + half] = 2 * s; dest[-1 + 2 * half] = s; dest[-1 + 3 * half] = 3 * s; // do quadruple butterfly and transposition if (zn_mod_is_slim (mod)) { // slim version for (i = 0; i < M/2; i++, src += K/2) { ulong x0 = src[0]; ulong x1 = src[K/4]; ulong x2 = src[M*K/4]; ulong x3 = src[M*K/4 + K/4]; dest[i] = zn_mod_add_slim (x0, x1, mod); dest[i + half] = zn_mod_sub_slim (x0, x1, mod); dest[i + 2 * half] = zn_mod_sub_slim (x0, x3, mod); dest[i + 3 * half] = zn_mod_add_slim (x0, x3, mod); dest[i + M/2] = zn_mod_add_slim (x2, x3, mod); dest[i + half + M/2] = zn_mod_sub_slim (x2, x3, mod); dest[i + 2 * half + M/2] = zn_mod_add_slim (x2, x1, mod); dest[i + 3 * half + M/2] = zn_mod_sub_slim (x2, x1, mod); } } else { // non-slim version for (i = 0; i < M/2; i++, src += K/2) { ulong x0 = src[0]; ulong x1 = src[K/4]; ulong x2 = src[M*K/4]; ulong x3 = src[M*K/4 + K/4]; dest[i] = zn_mod_add (x0, x1, mod); dest[i + half] = zn_mod_sub (x0, x1, mod); dest[i + 2 * half] = zn_mod_sub (x0, x3, mod); dest[i + 3 * half] = zn_mod_add (x0, x3, mod); dest[i + M/2] = zn_mod_add (x2, x3, mod); dest[i + half + M/2] = zn_mod_sub (x2, x3, mod); dest[i + 2 * half + M/2] = zn_mod_add (x2, x1, mod); dest[i + 3 * half + M/2] = zn_mod_sub (x2, x1, mod); } } } } #define nuss_fft \ ZNP_nuss_fft void nuss_fft (pmfvec_t op) { ZNP_ASSERT (op->lgK >= 2); ZNP_ASSERT (op->lgM + 1 >= op->lgK); if (op->lgK == 2) return; const zn_mod_struct* mod = op->mod; ulong M = op->M; ulong s, r = op->M >> (op->lgK - 3); ptrdiff_t half = op->skip << (op->lgK - 3); ulong* end = op->data + (op->skip << op->lgK); ulong* start; ulong* p; for (; r <= M; r <<= 1, half >>= 1) for (s = 0, start = op->data; s < M; s += r, start += op->skip) for (p = start; p < end; p += 2 * half) { pmf_bfly (p, p + half, M, mod); pmf_rotate (p + half, M + s); } } /* Inverse FFT, i.e. computes a_i = sum_{k=0}^{K-1} w^{-ik} b_k. The inputs are in bit-reversed order, and outputs in usual order. */ #define nuss_ifft \ ZNP_nuss_ifft void nuss_ifft (pmfvec_t op) { const zn_mod_struct* mod = op->mod; ulong M = op->M; ulong s, r = M; ulong r_last = op->M >> (op->lgK - 1); ptrdiff_t half = op->skip; ulong* end = op->data + (op->skip << op->lgK); ulong* p; ulong* start; for (; r >= r_last; r >>= 1, half <<= 1) for (s = 0, start = op->data; s < M; s += r, start += op->skip) for (p = start; p < end; p += 2 * half) { pmf_rotate (p + half, M - s); pmf_bfly (p + half, p, M, mod); } } /* This routine performs the reverse Nussbaumer substitution, i.e. maps Z -> X, Y -> X^(K/2), performing appropriate additions/subtractions to combine the overlapping coefficients, stores results in the res array, of length M*K/2. The main complication in this routine is the bias field in the fourier coefficients, i.e. the coefficients might be rotated by random angles (and the data needs to be transposed too). We still do everything in a single pass. */ #define nuss_combine \ ZNP_nuss_combine void nuss_combine (ulong* res, const pmfvec_t op) { ulong i, j; ulong M = op->M; const zn_mod_struct* mod = op->mod; // src1 points to i-th fourier coefficient // src2 points to (i + K/2)-th fourier coefficient // These get added/subtracted into res[i + j*K/2], 0 <= j < M. ulong* src1 = op->data + 1; ulong* src2 = op->data + op->skip * op->K/2 + 1; for (i = 0; i < op->K/2; i++, src1 += op->skip, src2 += op->skip) { // We'll be writing to dest[j*K/2], 0 <= j < M ulong* dest = res + i; // We want to start reading from the Y^i coefficient of src1, which // is located at index -bias(src1) mod 2M. But really there are only // M coefficients, so we reduce the index mod M, and put neg1 = 1 if // we have to negate the coefficients (i.e. they wrapped around // negacyclically). ulong s1 = (-src1[-1]) & (2*M - 1); int neg1 = (s1 >= M); if (neg1) s1 -= M; // Ditto for s2 and neg2 with respect to src2, except that we want // to start reading from the coefficient of Y^(i-1). ulong s2 = (-1 - src2[-1]) & (2*M - 1); int neg2 = (s2 >= M); if (neg2) s2 -= M; // Swap the inputs so that s1 <= s2. (Actually we don't want to disturb // src1 and src2, so put the pointers into x1 and x2 instead.) ulong* x1; ulong* x2; if (s1 < s2) { x1 = src1; x2 = src2; } else { x1 = src2; x2 = src1; ulong s_temp = s1; s1 = s2; s2 = s_temp; int neg_temp = neg1; neg1 = neg2; neg2 = neg_temp; } // Okay, now the picture looks like this: // // 0 s1 M // x1: CCCCCCCCAAAAAAAAAAAAABBBBBBBBBBBBBBBBBB // // s2 M // x2: BBBBBBBBBBBBBBBBBBCCCCCCCCAAAAAAAAAAAAA // Combine the portions marked AAAA dest = zn_skip_array_signed_add (dest, op->K/2, M - s2, x2 + s2, neg2, x1 + s1, neg1, mod); // Combine the portions marked BBBB (x2 stuff gets negated) dest = zn_skip_array_signed_add (dest, op->K/2, s2 - s1, x2, !neg2, x1 + s1 + M - s2, neg1, mod); // Combine the portions marked CCCC (both inputs get negated) zn_skip_array_signed_add (dest, op->K/2, s1, x2 + s2 - s1, !neg2, x1, !neg1, mod); } } #define nuss_pointwise_mul_fudge \ ZNP_nuss_pointwise_mul_fudge ulong nuss_pointwise_mul_fudge (unsigned lgM, int sqr, const zn_mod_t mod) { ulong M = 1UL << lgM; return _zn_array_mul_fudge (M, M, sqr, mod); } /* Multiplies fourier coefficients in op1 by those in op2, stores results in res. Inplace operation is okay. Automatically uses faster squaring version if inputs are the same pmfvec_t object. NOTE: for now this routine never recurses into Nussbaumer multiplication. Doing so would start to become relevant when the original multiplication problem has length around 10^9. The result comes out divided by a fudge factor, which can be recovered via nuss_pointwise_mul_fudge(). */ #define nuss_pointwise_mul \ ZNP_nuss_pointwise_mul void nuss_pointwise_mul (pmfvec_t res, const pmfvec_t op1, const pmfvec_t op2) { ZNP_ASSERT (pmfvec_compatible (res, op1)); ZNP_ASSERT (pmfvec_compatible (res, op2)); ulong i, M = res->M; pmf_t dest = res->data; pmf_const_t src1 = op1->data; pmf_const_t src2 = op2->data; ZNP_FASTALLOC (temp, ulong, 6624, 2 * M); temp[2*M - 1] = 0; for (i = 0; i < res->K; i++, dest += res->skip, src1 += op1->skip, src2 += op2->skip) { // add biases dest[0] = src1[0] + src2[0]; // plain multiplication... _zn_array_mul (temp, src1 + 1, M, src2 + 1, M, 1, res->mod); // ... negacyclic reduction. zn_array_sub (dest + 1, temp, temp + M, M, res->mod); } ZNP_FASTFREE (temp); } void nuss_params (unsigned* lgK, unsigned* lgM, unsigned lgL) { *lgK = (lgL / 2) + 1; *lgM = lgL - *lgK + 1; } ulong nuss_mul_fudge (unsigned lgL, int sqr, const zn_mod_t mod) { unsigned lgK, lgM; nuss_params (&lgK, &lgM, lgL); // need to divide by 2^lgK coming from FFT ulong fudge1 = zn_mod_pow2 (-lgK, mod); // and take into account fudge from pointwise multiplies ulong fudge2 = nuss_pointwise_mul_fudge (lgM, sqr, mod); return zn_mod_mul (fudge1, fudge2, mod); } void nuss_mul (ulong* res, const ulong* op1, const ulong* op2, pmfvec_t vec1, pmfvec_t vec2) { ZNP_ASSERT (vec1->lgM + 1 >= vec1->lgK); if (op1 != op2) { ZNP_ASSERT (pmfvec_compatible (vec1, vec2)); // split inputs into fourier coefficients and perform FFTs nuss_split (vec1, op1); nuss_fft (vec1); nuss_split (vec2, op2); nuss_fft (vec2); // multiply fourier coefficients into vec1 nuss_pointwise_mul (vec1, vec1, vec2); } else { // split input into fourier coefficients and perform FFT nuss_split (vec1, op1); nuss_fft (vec1); // square fourier coefficients into vec1 nuss_pointwise_mul (vec1, vec1, vec1); } // inverse FFT nuss_ifft (vec1); // recombine into result nuss_combine (res, vec1); } // end of file **************************************************************** zn_poly-0.9.2/src/pack.c000066400000000000000000000242641360464557000151070ustar00rootroot00000000000000/* pack.c: bit-packing/unpacking for Kronecker substitution routines Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" /* Same as zn_array_pack(), but requires b <= ULONG_BITS. */ #define zn_array_pack1 \ ZNP_zn_array_pack1 void zn_array_pack1 (mp_limb_t* res, const ulong* op, size_t n, ptrdiff_t s, unsigned b, unsigned k, size_t r) { ZNP_ASSERT (b > 0 && b <= ULONG_BITS); #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS // where to write the next limb mp_limb_t* dest = res; // write leading zero-padding while (k >= ULONG_BITS) { *dest++ = 0; k -= ULONG_BITS; } // limb currently being filled mp_limb_t buf = 0; // number of bits used in buf; always in [0, ULONG_BITS) unsigned buf_b = k; unsigned buf_b_old; for (; n > 0; n--, op += s) { ZNP_ASSERT (b >= ULONG_BITS || *op < (1UL << b)); // put low bits of current input into buffer buf += *op << buf_b; buf_b_old = buf_b; buf_b += b; if (buf_b >= ULONG_BITS) { // buffer is full; flush it *dest++ = buf; buf_b -= ULONG_BITS; // put remaining bits of current input into buffer buf = buf_b_old ? (*op >> (ULONG_BITS - buf_b_old)) : 0; } } // write last limb if it's non-empty if (buf_b) *dest++ = buf; // zero-pad up to requested length if (r) { size_t written = dest - res; ZNP_ASSERT (written <= r); for (; written < r; written++) *dest++ = 0; } #else #error Not nails-safe yet #endif } void zn_array_pack (mp_limb_t* res, const ulong* op, size_t n, ptrdiff_t s, unsigned b, unsigned k, size_t r) { ZNP_ASSERT (b > 0 && b < 3 * ULONG_BITS); if (b <= ULONG_BITS) { // use specialised version if b is small enough zn_array_pack1 (res, op, n, s, b, k, r); return; } #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS // where to write the next limb mp_limb_t* dest = res; // write leading zero-padding while (k >= ULONG_BITS) { *dest++ = 0; k -= ULONG_BITS; } // limb currently being filled mp_limb_t buf = 0; // number of bits used in buf; always in [0, ULONG_BITS) unsigned buf_b = k; unsigned buf_b_old; for (; n > 0; n--, op += s) { ZNP_ASSERT (b >= ULONG_BITS || *op < (1UL << b)); // put low bits of current input into buffer buf += *op << buf_b; buf_b_old = buf_b; buf_b += b; if (buf_b >= ULONG_BITS) { // buffer is full; flush it *dest++ = buf; buf_b -= ULONG_BITS; // put remaining bits of current input into buffer buf = buf_b_old ? (*op >> (ULONG_BITS - buf_b_old)) : 0; // write as many extra zeroes as necessary if (buf_b >= ULONG_BITS) { *dest++ = buf; buf = 0; buf_b -= ULONG_BITS; if (buf_b >= ULONG_BITS) { *dest++ = 0; buf_b -= ULONG_BITS; ZNP_ASSERT (buf_b < ULONG_BITS); } } } } // write last limb if it's non-empty if (buf_b) *dest++ = buf; // zero-pad up to requested length if (r) { size_t written = dest - res; ZNP_ASSERT (written <= r); for (; written < r; written++) *dest++ = 0; } #else #error Not nails-safe yet #endif } /* Same as zn_array_unpack(), but requires b <= ULONG_BITS (i.e. writes one word per coefficient) */ #define zn_array_unpack1 \ ZNP_zn_array_unpack1 void zn_array_unpack1 (ulong* res, const mp_limb_t* op, size_t n, unsigned b, unsigned k) { ZNP_ASSERT (b <= ULONG_BITS); #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS // limb we're currently extracting bits from mp_limb_t buf = 0; // number of bits currently in buf; always in [0, ULONG_BITS) unsigned buf_b = 0; // skip over k leading bits while (k >= GMP_NUMB_BITS) { k -= GMP_NUMB_BITS; op++; } if (k) { buf = *op++; buf >>= k; buf_b = ULONG_BITS - k; } if (b == ULONG_BITS) { // various special cases if (buf_b) { for (; n > 0; n--) { // we need bits from both sides of a limb boundary ulong temp = buf; buf = *op++; *res++ = temp + (buf << buf_b); buf >>= (ULONG_BITS - buf_b); } } else { for (; n > 0; n--) *res++ = *op++; } } else { ulong mask = (1UL << b) - 1; for (; n > 0; n--) { if (b <= buf_b) { // buf contains all the bits we need *res++ = buf & mask; buf >>= b; buf_b -= b; } else { // we need bits from both sides of a limb boundary ulong temp = buf; buf = *op++; *res++ = temp + ((buf << buf_b) & mask); buf >>= (b - buf_b); buf_b = ULONG_BITS - (b - buf_b); } } } #else #error Not nails-safe yet #endif } /* Same as zn_array_unpack(), but requires ULONG_BITS < b <= 2 * ULONG_BITS (i.e. writes two words per coefficient) */ #define zn_array_unpack2 \ ZNP_zn_array_unpack2 void zn_array_unpack2 (ulong* res, const mp_limb_t* op, size_t n, unsigned b, unsigned k) { ZNP_ASSERT (b > ULONG_BITS && b <= 2 * ULONG_BITS); #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS // limb we're currently extracting bits from mp_limb_t buf = 0; // number of bits currently in buf; always in [0, ULONG_BITS) unsigned buf_b = 0; // skip over k leading bits while (k >= GMP_NUMB_BITS) { k -= GMP_NUMB_BITS; op++; } if (k) { buf = *op++; buf >>= k; buf_b = ULONG_BITS - k; } if (b == 2 * ULONG_BITS) { n *= 2; // various special cases if (buf_b) { for (; n > 0; n--) { // we need bits from both sides of a limb boundary ulong temp = buf; buf = *op++; *res++ = temp + (buf << buf_b); buf >>= (ULONG_BITS - buf_b); } } else { for (; n > 0; n--) *res++ = *op++; } } else { b -= ULONG_BITS; ulong mask = (1UL << b) - 1; for (; n > 0; n--) { // shunt one whole limb through first if (buf_b) { ulong temp = buf; buf = *op++; *res++ = temp + (buf << buf_b); buf >>= (ULONG_BITS - buf_b); } else *res++ = *op++; // now handle the fractional limb if (b <= buf_b) { // buf contains all the bits we need *res++ = buf & mask; buf >>= b; buf_b -= b; } else { // we need bits from both sides of a limb boundary ulong temp = buf; buf = *op++; *res++ = temp + ((buf << buf_b) & mask); buf >>= (b - buf_b); buf_b = ULONG_BITS - (b - buf_b); } } } #else #error Not nails-safe yet #endif } /* Same as zn_array_unpack(), but requires 2 * ULONG_BITS < b < 3 * ULONG_BITS (i.e. writes three words per coefficient) */ #define zn_array_unpack3 \ ZNP_zn_array_unpack3 void zn_array_unpack3 (ulong* res, const mp_limb_t* op, size_t n, unsigned b, unsigned k) { ZNP_ASSERT (b > 2 * ULONG_BITS && b < 3 * ULONG_BITS); #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS // limb we're currently extracting bits from mp_limb_t buf = 0; // number of bits currently in buf; always in [0, ULONG_BITS) unsigned buf_b = 0; // skip over k leading bits while (k >= GMP_NUMB_BITS) { k -= GMP_NUMB_BITS; op++; } if (k) { buf = *op++; buf >>= k; buf_b = ULONG_BITS - k; } b -= 2 * ULONG_BITS; ulong mask = (1UL << b) - 1; for (; n > 0; n--) { // shunt two whole limbs through first if (buf_b) { ulong temp = buf; buf = *op++; *res++ = temp + (buf << buf_b); buf >>= (ULONG_BITS - buf_b); temp = buf; buf = *op++; *res++ = temp + (buf << buf_b); buf >>= (ULONG_BITS - buf_b); } else { *res++ = *op++; *res++ = *op++; } // now handle the fractional limb if (b <= buf_b) { // buf contains all the bits we need *res++ = buf & mask; buf >>= b; buf_b -= b; } else { // we need bits from both sides of a limb boundary ulong temp = buf; buf = *op++; *res++ = temp + ((buf << buf_b) & mask); buf >>= (b - buf_b); buf_b = ULONG_BITS - (b - buf_b); } } #else #error Not nails-safe yet #endif } void zn_array_unpack (ulong* res, const mp_limb_t* op, size_t n, unsigned b, unsigned k) { ZNP_ASSERT (b >= 1 && b <= 3 * ULONG_BITS); if (b <= ULONG_BITS) zn_array_unpack1 (res, op, n, b, k); else if (b <= 2 * ULONG_BITS) zn_array_unpack2 (res, op, n, b, k); else // b < 3 * ULONG_BITS zn_array_unpack3 (res, op, n, b, k); } // end of file **************************************************************** zn_poly-0.9.2/src/pmf.c000066400000000000000000000353651360464557000147570ustar00rootroot00000000000000/* pmf.c: polynomials modulo a fermat polynomial Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" #if DEBUG /* ============================================================================ debugging stuff ============================================================================ */ #include #include "support.h" void pmf_rand (pmf_t res, ulong M, const zn_mod_t mod) { ulong i; res[0] = random_ulong (2 * M); for (i = 1; i <= M; i++) res[i] = random_ulong (mod->m); } /* res := op, with bias set to zero. Inplace operation not allowed. */ void pmf_normalise (pmf_t res, pmf_t op, ulong M, const zn_mod_t mod) { ZNP_ASSERT (res != op); res[0] = 0; ulong i, b = op[0] & (2*M - 1); op++; res++; if (b < M) { for (i = 0; i < M - b; i++) res[b + i] = op[i]; for (i = 0; i < b; i++) res[i] = zn_mod_neg (op[i + M - b], mod); } else { b -= M; for (i = 0; i < M - b; i++) res[b + i] = zn_mod_neg (op[i], mod); for (i = 0; i < b; i++) res[i] = op[i + M - b]; } } int pmf_cmp (const pmf_t op1, const pmf_t op2, ulong M, const zn_mod_t mod) { ulong i; ulong b = op2[0] - op1[0]; ulong* x1 = op1 + 1; ulong* x2 = op2 + 1; if (b & M) { b &= (M - 1); for (i = b; i < M; i++) if (x1[i] != zn_mod_neg (x2[i - b], mod)) return 1; for (i = 0; i < b; i++) if (x1[i] != x2[i + M - b]) return 1; } else { b &= (M - 1); for (i = b; i < M; i++) if (x1[i] != x2[i - b]) return 1; for (i = 0; i < b; i++) if (x1[i] != zn_mod_neg (x2[i + M - b], mod)) return 1; } return 0; } void pmf_print (const pmf_t op, ulong M, const zn_mod_t mod) { ZNP_FASTALLOC (buf, ulong, 6624, M + 1); pmf_normalise (buf, op, M, mod); printf ("[%lu", buf[1]); ulong i; for (i = 1; i < M; i++) printf (" %lu", buf[i+1]); printf ("]"); ZNP_FASTFREE (buf); } void pmfvec_print (const pmfvec_t op) { printf ("M = %lu, K = %lu\n", op->M, op->K); ulong i; for (i = 0; i < op->K; i++) { printf ("%3lu: ", i); pmf_print (op->data + i * op->skip, op->M, op->mod); printf ("\n"); } } void pmfvec_print_trunc (const pmfvec_t op, ulong n) { printf ("M = %lu, K = %lu\n", op->M, op->K); ulong i; for (i = 0; i < n; i++) { printf ("%3lu: ", i); pmf_print (op->data + i * op->skip, op->M, op->mod); printf ("\n"); } } #endif /* ============================================================================ inplace array butterflies ============================================================================ */ /* op1 := op2 + op1 op2 := op2 - op1 where op1 and op2 are arrays of length n. Inputs must be [0, m); outputs will be in [0, m). */ void zn_array_bfly_inplace (ulong* op1, ulong* op2, ulong n, const zn_mod_t mod) { ulong x, y; if (zn_mod_is_slim (mod)) { // slim version // (unrolled) for (; n >= 4; n -= 4) { x = *op1; y = *op2; *op1++ = zn_mod_add_slim (y, x, mod); *op2++ = zn_mod_sub_slim (y, x, mod); x = *op1; y = *op2; *op1++ = zn_mod_add_slim (y, x, mod); *op2++ = zn_mod_sub_slim (y, x, mod); x = *op1; y = *op2; *op1++ = zn_mod_add_slim (y, x, mod); *op2++ = zn_mod_sub_slim (y, x, mod); x = *op1; y = *op2; *op1++ = zn_mod_add_slim (y, x, mod); *op2++ = zn_mod_sub_slim (y, x, mod); } for (; n; n--) { x = *op1; y = *op2; *op1++ = zn_mod_add_slim (y, x, mod); *op2++ = zn_mod_sub_slim (y, x, mod); } } else { // non-slim version // (unrolled) for (; n >= 4; n -= 4) { x = *op1; y = *op2; *op1++ = zn_mod_add (y, x, mod); *op2++ = zn_mod_sub (y, x, mod); x = *op1; y = *op2; *op1++ = zn_mod_add (y, x, mod); *op2++ = zn_mod_sub (y, x, mod); x = *op1; y = *op2; *op1++ = zn_mod_add (y, x, mod); *op2++ = zn_mod_sub (y, x, mod); x = *op1; y = *op2; *op1++ = zn_mod_add (y, x, mod); *op2++ = zn_mod_sub (y, x, mod); } for (; n; n--) { x = *op1; y = *op2; *op1++ = zn_mod_add (y, x, mod); *op2++ = zn_mod_sub (y, x, mod); } } } /* op1 := op1 + op2 where op1 and op2 are arrays of length n. Inputs must be [0, m); outputs will be in [0, m). */ void zn_array_add_inplace (ulong* op1, const ulong* op2, ulong n, const zn_mod_t mod) { if (zn_mod_is_slim (mod)) { // slim version // (unrolled) for (; n >= 4; n -= 4) { *op1 = zn_mod_add_slim (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_add_slim (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_add_slim (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_add_slim (*op1, *op2, mod); op1++; op2++; } for (; n; n--) { *op1 = zn_mod_add_slim (*op1, *op2, mod); op1++; op2++; } } else { // non-slim version // (unrolled) for (; n >= 4; n -= 4) { *op1 = zn_mod_add (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_add (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_add (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_add (*op1, *op2, mod); op1++; op2++; } for (; n; n--) { *op1 = zn_mod_add (*op1, *op2, mod); op1++; op2++; } } } /* op1 := op1 - op2 where op1 and op2 are arrays of length n. Inputs must be [0, m); outputs will be in [0, m). */ void zn_array_sub_inplace (ulong* op1, const ulong* op2, ulong n, const zn_mod_t mod) { if (zn_mod_is_slim (mod)) { // slim version // (unrolled) for (; n >= 4; n -= 4) { *op1 = zn_mod_sub_slim (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_sub_slim (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_sub_slim (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_sub_slim (*op1, *op2, mod); op1++; op2++; } for (; n; n--) { *op1 = zn_mod_sub_slim (*op1, *op2, mod); op1++; op2++; } } else { // non-slim version // (unrolled) for (; n >= 4; n -= 4) { *op1 = zn_mod_sub (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_sub (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_sub (*op1, *op2, mod); op1++; op2++; *op1 = zn_mod_sub (*op1, *op2, mod); op1++; op2++; } for (; n; n--) { *op1 = zn_mod_sub (*op1, *op2, mod); op1++; op2++; } } } /* ============================================================================ inplace pmf_t butterflies ============================================================================ */ /* In the following routines, we work with the "relative bias" between the two inputs, i.e. b = difference between bias of op1 and op2. This allows us to avoid unnecessary normalisation steps; we just add and subtract directly into the correct memory locations. */ void pmf_bfly (pmf_t op1, pmf_t op2, ulong M, const zn_mod_t mod) { ulong b = op2[0] - op1[0]; if (b & M) { // bias is in [M, 2M) mod 2M b &= (M - 1); // butterfly on op1[0, b) and op2[M - b, M) zn_array_bfly_inplace (op1 + 1, op2 + 1 + M - b, b, mod); // butterfly on op1[b, M) and op2[0, M - b) zn_array_bfly_inplace (op2 + 1, op1 + 1 + b, M - b, mod); } else { // bias is in [0, M) mod 2M b &= (M - 1); // butterfly on op1[0, b) and op2[M - b, M) zn_array_bfly_inplace (op2 + 1 + M - b, op1 + 1, b, mod); // butterfly on op1[b, M) and op2[0, M - b) zn_array_bfly_inplace (op1 + 1 + b, op2 + 1, M - b, mod); } } void pmf_add (pmf_t op1, const pmf_t op2, ulong M, const zn_mod_t mod) { ulong b = op2[0] - op1[0]; if (b & M) { // bias is in [M, 2M) mod 2M b &= (M - 1); // add op2[M - b, M) to op1[0, b) zn_array_add_inplace (op1 + 1, op2 + 1 + M - b, b, mod); // subtract op2[0, M - b) from op1[b, M) zn_array_sub_inplace (op1 + 1 + b, op2 + 1, M - b, mod); } else { // bias is in [0, M) mod 2M b &= (M - 1); // subtract op2[M - b, M) from op1[0, b) zn_array_sub_inplace (op1 + 1, op2 + 1 + M - b, b, mod); // add op2[0, M - b) to op1[b, M) zn_array_add_inplace (op1 + 1 + b, op2 + 1, M - b, mod); } } void pmf_sub (pmf_t op1, const pmf_t op2, ulong M, const zn_mod_t mod) { ulong b = op2[0] - op1[0]; if (b & M) { // bias is in [M, 2M) mod 2M b &= (M - 1); // subtract op2[M - b, M) from op1[0, b) zn_array_sub_inplace (op1 + 1, op2 + 1 + M - b, b, mod); // add op2[0, M - b) to op1[b, M) zn_array_add_inplace (op1 + 1 + b, op2 + 1, M - b, mod); } else { // bias is in [0, M) mod 2M b &= (M - 1); // add op2[M - b, M) to op1[0, b) zn_array_add_inplace (op1 + 1, op2 + 1 + M - b, b, mod); // subtract op2[0, M - b) from op1[b, M) zn_array_sub_inplace (op1 + 1 + b, op2 + 1, M - b, mod); } } /* ============================================================================ pmfvec stuff ============================================================================ */ void pmfvec_init (pmfvec_t res, unsigned lgK, ptrdiff_t skip, unsigned lgM, const zn_mod_t mod) { ZNP_ASSERT (skip >= (1UL << lgM) + 1); res->lgK = lgK; res->lgM = lgM; res->skip = skip; res->K = 1UL << lgK; res->M = 1UL << lgM; res->mod = mod; res->data = (ulong*) malloc (sizeof (ulong) * skip * res->K); } void pmfvec_init_nuss (pmfvec_t res, unsigned lgL, const zn_mod_t mod) { unsigned lgK, lgM; nuss_params (&lgK, &lgM, lgL); pmfvec_init (res, lgK, (1UL << lgM) + 1, lgM, mod); } void pmfvec_clear (pmfvec_t op) { free (op->data); } void pmfvec_set (pmfvec_t res, const pmfvec_t op) { ZNP_ASSERT (pmfvec_compatible (res, op)); ulong i; for (i = 0; i < op->K; i++) pmf_set (res->data + i * res->skip, op->data + i * op->skip, op->M); } void pmfvec_scalar_mul (pmfvec_t op, ulong n, ulong x) { ZNP_ASSERT (n <= op->K); ulong i; pmf_t ptr = op->data; for (i = 0; i < n; i++, ptr += op->skip) pmf_scalar_mul (ptr, op->M, x, op->mod); } ulong pmfvec_mul_fudge (unsigned lgM, int sqr, const zn_mod_t mod) { int use_nuss = (lgM >= (sqr ? tuning_info[mod->bits].nuss_sqr_thresh : tuning_info[mod->bits].nuss_mul_thresh)); if (use_nuss) return nuss_mul_fudge (lgM, sqr, mod); else return _zn_array_mul_fudge (1UL << lgM, 1UL << lgM, sqr, mod); } void pmfvec_mul (pmfvec_t res, const pmfvec_t op1, const pmfvec_t op2, ulong n, int special_first_two) { ZNP_ASSERT (res->mod->m & 1); ZNP_ASSERT (pmfvec_compatible (res, op1)); ZNP_ASSERT (pmfvec_compatible (res, op2)); ZNP_ASSERT (res->M >= 2 || !special_first_two); pmf_const_t p1 = op1->data; pmf_const_t p2 = op2->data; pmf_t p3 = res->data; const zn_mod_struct* mod = res->mod; ulong M = op1->M; unsigned lgM = op1->lgM; int sqr = (op1 == op2); // use nussbaumer algorithm if the pointwise mults are large enough int use_nuss = (lgM >= (sqr ? tuning_info[mod->bits].nuss_sqr_thresh : tuning_info[mod->bits].nuss_mul_thresh)); // scratch space for nussbaumer multiplications pmfvec_t vec1, vec2; unsigned nuss_lgK; if (use_nuss) { pmfvec_init_nuss (vec1, lgM, mod); pmfvec_init_nuss (vec2, lgM, mod); nuss_lgK = vec1->lgK; } ulong i = 0; if (special_first_two) { ZNP_FASTALLOC (temp, ulong, 6624, 2 * M); // need to adjust the fudge factors, so that the fudge factor for these // first two special products matches up with the fudge factors for the // remaining products ulong fudge, fudge1, fudge2; fudge2 = use_nuss ? nuss_mul_fudge (lgM, sqr, mod) : _zn_array_mul_fudge (M, M, sqr, mod); fudge1 = _zn_array_mul_fudge (M/2, M/2, sqr, mod); fudge = (fudge1 == fudge2) ? 1 : zn_mod_mul (fudge1, zn_mod_invert (fudge2, mod), mod); // length M/2 multiplications for (; i < 2 && i < n; i++, p3 += res->skip, p1 += op1->skip, p2 += op2->skip) { // add biases p3[0] = p1[0] + p2[0]; // do the actual multiplication _zn_array_mul (temp, p1 + 1, M/2, p2 + 1, M/2, 1, mod); // apply the fudge factor zn_array_scalar_mul_or_copy (p3 + 1, temp, M - 1, fudge, mod); p3[M] = 0; } ZNP_FASTFREE(temp); } if (use_nuss) { for (; i < n; i++, p3 += res->skip, p1 += op1->skip, p2 += op2->skip) { // add biases p3[0] = p1[0] + p2[0]; nuss_mul (p3 + 1, p1 + 1, p2 + 1, vec1, vec2); } pmfvec_clear (vec2); pmfvec_clear (vec1); } else { // scratch space for KS negacyclic multiplication ZNP_FASTALLOC (temp, ulong, 6624, 2*M); temp[2*M - 1] = 0; for (; i < n; i++, p3 += res->skip, p1 += op1->skip, p2 += op2->skip) { // add biases p3[0] = p1[0] + p2[0]; // ordinary multiplication... _zn_array_mul (temp, p1 + 1, M, p2 + 1, M, 1, mod); // ... negacyclic reduction zn_array_sub (p3 + 1, temp, temp + M, M, mod); } ZNP_FASTFREE (temp); } } void pmfvec_reverse (pmfvec_t op, ulong n) { op->data += op->skip * (n - 1); op->skip = -op->skip; } // end of file **************************************************************** zn_poly-0.9.2/src/pmfvec_fft.c000066400000000000000000000616661360464557000163170ustar00rootroot00000000000000/* pmfvec_fft.c: FFT/IFFT and transposed FFT/IFFT routines for pmfvec_t Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "zn_poly_internal.h" /* ============================================================================ FFT routines ============================================================================ */ void pmfvec_fft_basecase (pmfvec_t op, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); if (op->lgK == 0) return; // just plain butterfly loop ulong M = op->M; const zn_mod_struct* mod = op->mod; ulong s, r = M >> (op->lgK - 1); ptrdiff_t half = op->skip << (op->lgK - 1); ulong* end = op->data + (op->skip << op->lgK); ulong* p; ulong* start; for (; r <= M; r <<= 1, half >>= 1, t <<= 1) for (start = op->data, s = t; s < M; s += r, start += op->skip) for (p = start; p < end; p += 2 * half) { pmf_bfly (p, p + half, M, mod); pmf_rotate (p + half, M + s); } } void pmfvec_fft_dc (pmfvec_t op, ulong n, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (n >= 1 && n <= op->K); ZNP_ASSERT (z >= 1 && z <= op->K); if (op->K == 1) return; if (n == op->K && z == op->K) { // No truncation requested; use iterative version pmfvec_fft_basecase (op, t); return; } const zn_mod_struct* mod = op->mod; // We treat the input as two rows and U = K/2 columns, in row-major order. // descend to first row (first half of op) op->lgK--; op->K >>= 1; long i; ulong M = op->M; ulong U = op->K; ulong* p = op->data; ptrdiff_t skip = op->skip; ptrdiff_t half = skip << op->lgK; ulong z2 = ZNP_MIN (z, U); if (n <= U) { // Only need the first output of the first layer of butterflies. for (i = 0; i < (long)(z - U); i++, p += skip) pmf_add (p, p + half, M, mod); // Recurse into top row pmfvec_fft_dc (op, n, z2, t << 1); } else { // Need both outputs from the first layer of butterflies. ulong s = t; ulong r = M >> op->lgK; for (i = 0; i < (long)(z - U); i++, p += skip, s += r) { pmf_bfly (p, p + half, M, mod); pmf_rotate (p + half, M + s); } // Butterflies where second input is zero for (; i < z2; i++, p += skip, s += r) { pmf_set (p + half, p, M); pmf_rotate (p + half, s); } // Recurse into top row... pmfvec_fft_dc (op, U, z2, t << 1); // ... and recurse into bottom row op->data += half; pmfvec_fft_dc (op, n - U, z2, t << 1); op->data -= half; } // pop back to whole transform op->K <<= 1; op->lgK++; } /* As described above, this splits the length K transform into T = 2^lgT rows by U = 2^lgU columns, where K = U * T. Must have 0 < lgT < lgK. */ void pmfvec_fft_huge (pmfvec_t op, unsigned lgT, ulong n, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (lgT > 0 && lgT < op->lgK); ZNP_ASSERT (n >= 1 && n <= op->K); ZNP_ASSERT (z >= 1 && z <= op->K); unsigned lgK = op->lgK; unsigned lgU = lgK - lgT; ulong K = op->K; ulong T = 1UL << lgT; ulong U = 1UL << lgU; ptrdiff_t skip = op->skip; ptrdiff_t skip_U = skip << lgU; ulong* data = op->data; // We need n output coefficients, starting from the top-left, in row-major // order. // Write n = U * nT + nU, where 0 <= nU < U ulong nU = n & (U - 1); ulong nT = n >> lgU; // nT_ceil = number of rows of output, including the last partial row ulong nT_ceil = nT + (nU > 0); // Write z = U * zT + zU, where 0 <= zU < U ulong zT = z >> lgU; ulong zU = z & (U - 1); ulong zU2 = zT ? U : zU; ulong r = op->M >> (lgK - 1); ulong s, i; // --------------- FFTs along columns op->K = T; op->lgK = lgT; op->skip = skip_U; // First handle the columns with zT + 1 input coefficients. for (i = 0, s = t; i < zU; i++, op->data += skip, s += r) pmfvec_fft (op, nT_ceil, zT + 1, s); // Handle the remaining columns, which only have zT input coefficients. for (; i < zU2; i++, op->data += skip, s += r) pmfvec_fft (op, nT_ceil, zT, s); // --------------- FFTs along rows op->data = data; op->K = U; op->lgK = lgU; op->skip = skip; t <<= lgT; // Handle the first nT rows. for (i = 0; i < nT; i++, op->data += skip_U) pmfvec_fft (op, U, zU2, t); // For the last row, we only need the first nU outputs: if (nU) pmfvec_fft (op, nU, zU2, t); // --------------- restore parameters op->data = data; op->K = K; op->lgK = lgK; } void pmfvec_fft (pmfvec_t op, ulong n, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (n >= 1 && n <= op->K); ZNP_ASSERT (z >= 1 && z <= op->K); if (op->K <= 2 || 2 * op->K * op->M * sizeof (ulong) <= ZNP_CACHE_SIZE) { // FFT is pretty small; use divide-and-conquer pmfvec_fft_dc (op, n, z, t); } else { // FFT is relatively big; use factoring algorithm instead pmfvec_fft_huge (op, op->lgK / 2, n, z, t); } } /* ============================================================================ inverse FFT routines ============================================================================ */ void pmfvec_ifft_basecase (pmfvec_t op, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); if (op->lgK == 0) return; // just plain butterfly loop ulong M = op->M; const zn_mod_struct* mod = op->mod; ulong s, r = M; ulong r_last = M >> (op->lgK - 1); t <<= (op->lgK - 1); ptrdiff_t half = op->skip; ulong* end = op->data + (op->skip << op->lgK); ulong* p; ulong* start; for (; r >= r_last; r >>= 1, half <<= 1, t >>= 1) for (start = op->data, s = t; s < M; s += r, start += op->skip) for (p = start; p < end; p += 2 * half) { pmf_rotate (p + half, M - s); pmf_bfly (p + half, p, M, mod); } } void pmfvec_ifft_dc (pmfvec_t op, ulong n, int fwd, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (z >= 1 && z <= op->K); ZNP_ASSERT (n + fwd >= 1 && n + fwd <= op->K); ZNP_ASSERT (n <= z); if (op->K == 1) return; if (n == op->K) { // No truncation requested; use iterative version pmfvec_ifft_basecase (op, t); return; } const zn_mod_struct* mod = op->mod; // We treat the input as two rows and U = K / 2 columns, in row-major order. // descend to first row (first half of op) op->K >>= 1; op->lgK--; ulong M = op->M; ulong U = op->K; ptrdiff_t skip = op->skip; ptrdiff_t half = skip << op->lgK; // symbols in the following diagrams: // A = fully untransformed coefficient (one of the a_i) // B = intermediate coefficient // C = fully transformed coefficient (one of the b_k) // a, b, c = same as three above, but implied zero // ? = garbage that we don't care about // * = the "forward" C coefficient, or "?" if no forward coefficient // requested // The horizontal transforms convert between B and C. // The vertical butterflies convert between A and B. if (n + fwd <= U) { // The input could look like one of the following: // CCCCAAAA CCCCAAAA CCCCAAaa CCCCaaaa // AAAAAAaa or AAaaaaaa or aaaaaaaa or aaaaaaaa long zU2 = ZNP_MIN (z, U); long last_zero_fwd_bfly = ZNP_MAX (z - zU2, n); long last_zero_cross_bfly = ZNP_MIN (z - zU2, n); long i = zU2 - 1; pmf_t p = op->data + skip * i; // First some forward butterflies ("Aa" => "B?") to make them look like: // CCCCAABB CCCCBBBB CCCCBBaa CCCCaaaa // AAAAAA?? or AAaa???? or aaaa??aa or aaaaaaaa for (; i >= last_zero_fwd_bfly; i--, p -= skip) { // (2*a0, ?) -> (a0, ?) = (b0, ?) pmf_divby2 (p, M, mod); } // Then some forward butterflies ("AA" => "B?") to make them look like: // CCCCBBBB CCCCBBBB CCCCBBaa CCCCaaaa // AAAA???? or AAaa???? or aaaa??aa or aaaaaaaa for (; i >= (long) n; i--, p -= skip) { // (2*a0, 2*a1) -> (a0 + a1, ?) = (b0, ?) pmf_add (p, p + half, M, mod); pmf_divby2 (p, M, mod); } // Transform the first row to make them look like: // BBBB*??? BBBB*??? BBBB*??? BBBB*??? // AAAA???? or AAaa???? or aaaa??aa or aaaaaaaa pmfvec_ifft_dc (op, n, fwd, zU2, t << 1); // Cross butterflies ("Ba" => "A?") to make them look like: // BBBB*??? BBAA*??? AAAA*??? AAAA*??? // AAAA???? or AA?????? or ??????aa or ????aaaa for (; i >= last_zero_cross_bfly; i--, p -= skip) { // (b0, ?) -> (2*b0, ?) = (2*a0, ?) pmf_add (p, p, M, mod); } // Cross butterflies ("BA" => "A?") to make them look like: // AAAA*??? AAAA*??? AAAA*??? AAAA*??? // ???????? or ???????? or ??????aa or ????aaaa for (; i >= 0; i--, p -= skip) { // (b0, 2*a1) -> (2*b0 - 2*a1, ?) = (2*a0, ?) pmf_add (p, p, M, mod); pmf_sub (p, p + half, M, mod); } } else { // The input looks like one of these: // CCCCCCCC CCCCCCCC // AAAAaaaa (fwd == 1) or CCCAAAaa // Transform first row (no truncation necessary) to make them look like: // BBBBBBBB BBBBBBBB // AAAAaaaa (fwd == 1) or CCCAAAaa pmfvec_ifft_basecase (op, t << 1); long i = U - 1; ulong r = M >> op->lgK; ulong s = t + r * i; pmf_t p = op->data + skip * i; long last_zero_cross_bfly = z - U; long last_cross_bfly = n - U; // Cross butterflies ("Ba" => "AB") to make them look like: // BBBBAAAA BBBBBBAA // AAAABBBB (fwd == 1) or CCCAAABB for (; i >= last_zero_cross_bfly; i--, s -= r, p -= skip) { // (b0, ?) -> (2*b0, w*b0) = (2*a0, b1) pmf_set (p + half, p, M); pmf_rotate (p + half, s); pmf_add (p, p, M, mod); } // Cross butterflies ("BA" => "AB") to make them look like: // AAAAAAAA BBBAAAAA // BBBBBBBB (fwd == 1) or CCCBBBBB for (; i >= last_cross_bfly; i--, s -= r, p -= skip) { // (b0, 2*a1) -> (2*(b0-a1), w*(b0-2*a1)) = (2*a0, b1) pmf_sub (p + half, p, M, mod); pmf_sub (p, p + half, M, mod); pmf_rotate (p + half, M + s); } // Transform second row to make them look like: // AAAAAAAA BBBAAAAA // *??????? (fwd == 1) or BBB*???? op->data += half; pmfvec_ifft_dc (op, n - U, fwd, U, t << 1); op->data -= half; // Inverse butterflies ("BB" => "AA") to make them look like: // AAAAAAAA AAAAAAAA // *??????? (fwd == 1) or AAA*???? for (; i >= 0; i--, s -= r, p -= skip) { // (b0, b1) -> (b0 + w*b1, b0 - w*b1) = (2*a0, 2*a1) pmf_rotate (p + half, M - s); pmf_bfly (p + half, p, M, mod); } } // pop back to full size op->K <<= 1; op->lgK++; } void pmfvec_ifft_huge (pmfvec_t op, unsigned lgT, ulong n, int fwd, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (z >= 1 && z <= op->K); ZNP_ASSERT (n + fwd >= 1 && n + fwd <= op->K); ZNP_ASSERT (n <= z); ZNP_ASSERT (lgT > 0 && lgT < op->lgK); unsigned lgK = op->lgK; unsigned lgU = lgK - lgT; ulong K = op->K; ulong T = 1UL << lgT; ulong U = 1UL << lgU; ptrdiff_t skip = op->skip; ptrdiff_t skip_U = skip << lgU; ulong* data = op->data; // Write n = U * nT + nU, where 0 <= nU < U ulong nU = n & (U - 1); ulong nT = n >> lgU; // Write z = U * zT + zU, where 0 <= zU < U ulong zU = z & (U - 1); ulong zT = z >> lgU; ulong zU2 = zT ? U : zU; ulong mU1 = ZNP_MIN (zU, nU); ulong mU2 = ZNP_MAX (zU, nU); int fwd2 = nU || fwd; ulong r = op->M >> (lgK - 1); ulong s, i; ulong tT = t << lgT; // where: // symbols in the following diagrams: // A = fully untransformed coefficient (one of the a_i) // B = intermediate coefficient // C = fully transformed coefficient (one of the b_k) // ? = garbage that we don't care about // * = the "forward" C coefficient, or "?" if no forward coefficient // requested // The input looks something like this: // // CCCCCCCC // CCCCCCCC // CCCAAAAA // AAAAAAAA // // (we won't bother marking in the locations of zeroes on the diagrams) // // The horizontal transforms convert between B and C. // The vertical transforms convert between A and B. // First do row transforms for complete rows, to make it look like: // BBBBBBBB // BBBBBBBB // CCCAAAAA // AAAAAAAA op->lgK = lgU; op->K = U; for (i = 0; i < nT; i++, op->data += skip_U) pmfvec_ifft (op, U, 0, U, tT); // Column transforms for the rightmost columns, to obtain // BBBAAAAA // BBBAAAAA // CCCBBBBB // AAA????? op->lgK = lgT; op->K = T; op->skip = skip_U; for (i = nU, op->data = data + (skip * nU), s = t + (r * nU); i < mU2; i++, op->data += skip, s += r) { pmfvec_ifft (op, nT, fwd2, zT + 1, s); } for (; i < zU2; i++, op->data += skip, s += r) pmfvec_ifft (op, nT, fwd2, zT, s); // If there is still a partial row to deal with.... if (fwd2) { // Transform the partial row to obtain // BBBAAAAA // BBBAAAAA // BBB*???? // AAA????? op->data = data + nT * skip_U; op->lgK = lgU; op->K = U; op->skip = skip; pmfvec_ifft (op, nU, fwd, zU2, tT); // Column transforms for the leftmost columns, to obtain // AAAAAAAA // AAAAAAAA // AAA*???? // ???????? op->lgK = lgT; op->K = T; op->skip = skip_U; for (i = 0, op->data = data, s = t; i < mU1; i++, op->data += skip, s += r) { pmfvec_ifft (op, nT + 1, 0, zT + 1, s); } for (; i < nU; i++, op->data += skip, s += r) pmfvec_ifft (op, nT + 1, 0, zT, s); } // restore parameters op->lgK = lgK; op->K = K; op->skip = skip; op->data = data; } void pmfvec_ifft (pmfvec_t op, ulong n, int fwd, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (z <= op->K); ZNP_ASSERT (n <= z); ZNP_ASSERT (n + fwd <= op->K); if (op->K <= 2 || 2 * op->K * op->M * sizeof(ulong) <= ZNP_CACHE_SIZE) { // IFFT is pretty small; use use divide-and-conquer pmfvec_ifft_dc (op, n, fwd, z, t); } else { // IFFT is relatively big; use factoring algorithm instead pmfvec_ifft_huge (op, op->lgK / 2, n, fwd, z, t); } } /* ============================================================================ transposed FFT routines ============================================================================ */ void pmfvec_tpfft_basecase (pmfvec_t op, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); if (op->lgK == 0) return; ulong M = op->M; const zn_mod_struct* mod = op->mod; ulong s, r = M; ulong r_last = M >> (op->lgK - 1); t <<= (op->lgK - 1); ptrdiff_t half = op->skip; ulong* end = op->data + (op->skip << op->lgK); ulong* p; ulong* start; for (; r >= r_last; r >>= 1, half <<= 1, t >>= 1) for (start = op->data, s = t; s < M; s += r, start += op->skip) for (p = start; p < end; p += 2 * half) { pmf_rotate (p + half, M + s); pmf_bfly (p + half, p, M, mod); } } void pmfvec_tpfft_dc (pmfvec_t op, ulong n, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (n >= 1 && n <= op->K); ZNP_ASSERT (z >= 1 && z <= op->K); if (op->K == 1) return; if (n == op->K && z == op->K) { pmfvec_tpfft_basecase (op, t); return; } const zn_mod_struct* mod = op->mod; op->lgK--; op->K >>= 1; long i; ulong M = op->M; ulong U = op->K; ulong* p = op->data; ptrdiff_t skip = op->skip; ptrdiff_t half = skip << op->lgK; ulong z2 = ZNP_MIN (z, U); if (n <= U) { pmfvec_tpfft_dc (op, n, z2, t << 1); for (i = 0; i < (long)(z - U); i++, p += skip) pmf_set (p + half, p, M); } else { op->data += half; pmfvec_tpfft_dc (op, n - U, z2, t << 1); op->data -= half; pmfvec_tpfft_dc (op, U, z2, t << 1); ulong s = t; ulong r = M >> op->lgK; for (i = 0; i < (long)(z - U); i++, p += skip, s += r) { pmf_rotate (p + half, M + s); pmf_bfly (p + half, p, M, mod); } for (; i < z2; i++, p += skip, s += r) { pmf_rotate (p + half, s); pmf_add (p, p + half, M, mod); } } op->K <<= 1; op->lgK++; } void pmfvec_tpfft_huge (pmfvec_t op, unsigned lgT, ulong n, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (lgT > 0 && lgT < op->lgK); ZNP_ASSERT (n >= 1 && n <= op->K); ZNP_ASSERT (z >= 1 && z <= op->K); unsigned lgK = op->lgK; unsigned lgU = lgK - lgT; ulong K = op->K; ulong T = 1UL << lgT; ulong U = 1UL << lgU; ptrdiff_t skip = op->skip; ptrdiff_t skip_U = skip << lgU; ulong* data = op->data; ulong nU = n & (U - 1); ulong nT = n >> lgU; ulong nT_ceil = nT + (nU > 0); ulong zT = z >> lgU; ulong zU = z & (U - 1); ulong zU2 = zT ? U : zU; ulong r = op->M >> (lgK - 1); ulong s, i; op->K = U; op->lgK = lgU; t <<= lgT; for (i = 0; i < nT; i++, op->data += skip_U) pmfvec_tpfft (op, U, zU2, t); if (nU) pmfvec_tpfft (op, nU, zU2, t); op->data = data; op->K = T; op->lgK = lgT; op->skip = skip_U; t >>= lgT; for (i = 0, s = t; i < zU; i++, op->data += skip, s += r) pmfvec_tpfft (op, nT_ceil, zT + 1, s); for (; i < zU2; i++, op->data += skip, s += r) pmfvec_tpfft (op, nT_ceil, zT, s); op->data = data; op->skip = skip; op->K = K; op->lgK = lgK; } void pmfvec_tpfft (pmfvec_t op, ulong n, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (n >= 1 && n <= op->K); ZNP_ASSERT (z >= 1 && z <= op->K); if (op->K <= 2 || 2 * op->K * op->M * sizeof (ulong) <= ZNP_CACHE_SIZE) { pmfvec_tpfft_dc (op, n, z, t); } else { pmfvec_tpfft_huge (op, op->lgK / 2, n, z, t); } } /* ============================================================================ transposed inverse IFFT routines ============================================================================ */ void pmfvec_tpifft_basecase (pmfvec_t op, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2*op->M); if (op->lgK == 0) return; ulong M = op->M; const zn_mod_struct* mod = op->mod; ulong s, r = M >> (op->lgK - 1); ptrdiff_t half = op->skip << (op->lgK - 1); ulong* end = op->data + (op->skip << op->lgK); ulong* p; ulong* start; for (; r <= M; r <<= 1, half >>= 1, t <<= 1) for (start = op->data, s = t; s < M; s += r, start += op->skip) for (p = start; p < end; p += 2 * half) { pmf_bfly (p, p + half, M, mod); pmf_rotate (p + half, M - s); } } void pmfvec_tpifft_dc (pmfvec_t op, ulong n, int fwd, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (z >= 1 && z <= op->K); ZNP_ASSERT (n + fwd >= 1 && n + fwd <= op->K); ZNP_ASSERT (n <= z); if (op->K == 1) return; if (n == op->K) { pmfvec_tpifft_basecase (op, t); return; } const zn_mod_struct* mod = op->mod; op->lgK--; op->K >>= 1; long i; ulong M = op->M; ulong U = op->K; pmf_t p = op->data; ptrdiff_t skip = op->skip; ptrdiff_t half = skip << op->lgK; if (n + fwd <= U) { long zU2 = ZNP_MIN (z, U); long last_zero_fwd_bfly = ZNP_MAX (z - zU2, n); long last_zero_cross_bfly = ZNP_MIN (z - zU2, n); for (i = 0; i < last_zero_cross_bfly; i++, p += skip) { pmf_set (p + half, p, M); pmf_rotate (p + half, M); pmf_add (p, p, M, mod); } for (; i < n; i++, p += skip) pmf_add (p, p, M, mod); pmfvec_tpifft_dc (op, n, fwd, zU2, t << 1); for (; i < last_zero_fwd_bfly; i++, p += skip) { pmf_divby2 (p, M, mod); pmf_set (p + half, p, M); } for (; i < zU2; i++, p += skip) pmf_divby2 (p, M, mod); } else { long last_zero_cross_bfly = z - U; long last_cross_bfly = n - U; ulong r = M >> op->lgK; ulong s = t; for (i = 0; i < last_cross_bfly; i++, s += r, p += skip) { pmf_bfly (p, p + half, M, mod); pmf_rotate (p + half, M - s); } op->data += half; pmfvec_tpifft_dc (op, n - U, fwd, U, t << 1); op->data -= half; for (; i < last_zero_cross_bfly; i++, s += r, p += skip) { pmf_rotate (p + half, M + s); pmf_sub (p + half, p, M, mod); pmf_sub (p, p + half, M, mod); } for (; i < U; i++, s += r, p += skip) { pmf_add (p, p, M, mod); pmf_rotate (p + half, s); pmf_add (p, p + half, M, mod); } pmfvec_tpifft_basecase (op, t << 1); } op->K <<= 1; op->lgK++; } void pmfvec_tpifft_huge (pmfvec_t op, unsigned lgT, ulong n, int fwd, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (z >= 1 && z <= op->K); ZNP_ASSERT (n + fwd >= 1 && n + fwd <= op->K); ZNP_ASSERT (n <= z); ZNP_ASSERT (lgT > 0 && lgT < op->lgK); unsigned lgK = op->lgK; unsigned lgU = lgK - lgT; ulong K = op->K; ulong T = 1UL << lgT; ulong U = 1UL << lgU; ptrdiff_t skip = op->skip; ptrdiff_t skip_U = skip << lgU; ulong* data = op->data; ulong nU = n & (U - 1); ulong nT = n >> lgU; ulong zU = z & (U - 1); ulong zT = z >> lgU; ulong zU2 = zT ? U : zU; ulong mU1 = ZNP_MIN (zU, nU); ulong mU2 = ZNP_MAX (zU, nU); int fwd2 = nU || fwd; ulong r = op->M >> (lgK - 1); ulong s, i; ulong tT = t << lgT; if (fwd2) { op->lgK = lgT; op->K = T; op->skip = skip_U; for (i = 0, op->data = data, s = t; i < mU1; i++, op->data += skip, s += r) { pmfvec_tpifft (op, nT + 1, 0, zT + 1, s); } for (; i < nU; i++, op->data += skip, s += r) pmfvec_tpifft (op, nT + 1, 0, zT, s); op->data = data + nT * skip_U; op->lgK = lgU; op->K = U; op->skip = skip; pmfvec_tpifft (op, nU, fwd, zU2, tT); } op->lgK = lgT; op->K = T; op->skip = skip_U; for (i = nU, op->data = data + (skip * nU), s = t + (r * nU); i < mU2; i++, op->data += skip, s += r) { pmfvec_tpifft (op, nT, fwd2, zT + 1, s); } for (; i < zU2; i++, op->data += skip, s += r) pmfvec_tpifft (op, nT, fwd2, zT, s); op->data = data; op->skip = skip; op->lgK = lgU; op->K = U; for (i = 0; i < nT; i++, op->data += skip_U) pmfvec_tpifft (op, U, 0, U, tT); op->data = data; op->lgK = lgK; op->K = K; } void pmfvec_tpifft (pmfvec_t op, ulong n, int fwd, ulong z, ulong t) { ZNP_ASSERT (op->lgK <= op->lgM + 1); ZNP_ASSERT (t * op->K < 2 * op->M); ZNP_ASSERT (z >= 1 && z <= op->K); ZNP_ASSERT (n + fwd >= 1 && n + fwd <= op->K); ZNP_ASSERT (n <= z); if (op->K <= 2 || 2 * op->K * op->M * sizeof (ulong) <= ZNP_CACHE_SIZE) { pmfvec_tpifft_dc (op, n, fwd, z, t); } else { pmfvec_tpifft_huge (op, op->lgK / 2, n, fwd, z, t); } } // end of file **************************************************************** zn_poly-0.9.2/src/zn_mod.c000066400000000000000000000063641360464557000154600ustar00rootroot00000000000000/* zn_mod.c: functions operating on zn_mod_t objects Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" void zn_mod_init (zn_mod_t mod, ulong m) { ZNP_ASSERT (m >= 2); mod->m = m; mod->bits = ceil_lg (m); mpz_t x, y; mpz_init (x); mpz_init (y); // compute B and B^2 mod m mpz_set_ui (x, 1); mpz_mul_2exp (x, x, ULONG_BITS); mpz_mod_ui (x, x, m); mod->B = mpz_get_ui (x); mpz_set_ui (x, 1); mpz_mul_2exp (x, x, 2*ULONG_BITS); mpz_mod_ui (x, x, m); mod->B2 = mpz_get_ui (x); // compute sh1 and inv1 mod->sh1 = ceil_lg (m) - 1; mpz_set_ui (x, 1); mpz_mul_2exp (x, x, mod->sh1 + 1); mpz_sub_ui (x, x, m); mpz_mul_2exp (x, x, ULONG_BITS); mpz_fdiv_q_ui (x, x, m); mpz_add_ui (x, x, 1); mod->inv1 = mpz_get_ui (x); // compute sh2, sh3, inv2, m_norm unsigned ell = floor_lg (m) + 1; mod->sh2 = ULONG_BITS - ell; mod->sh3 = ell - 1; mod->m_norm = m << mod->sh2; mpz_set_ui (x, 1); mpz_mul_2exp (x, x, ell); mpz_sub_ui (x, x, m); mpz_mul_2exp (x, x, ULONG_BITS); mpz_sub_ui (x, x, 1); mpz_fdiv_q_ui (x, x, m); mod->inv2 = mpz_get_ui (x); // compute inv3, if m is odd if (m & 1) { // m^(-1) = m mod 8 ulong minv = m; // lift 2-adically int i; for (i = 3; i < ULONG_BITS; i <<= 1) minv = 2 * minv - m * minv * minv; mod->inv3 = minv; } mpz_clear (y); mpz_clear (x); } void zn_mod_clear (zn_mod_t mod) { // nothing to do yet, but maybe one day there will be } ulong zn_mod_pow2 (int k, const zn_mod_t mod) { ZNP_ASSERT (mod->m & 1); ZNP_ASSERT (k > -ULONG_BITS && k < ULONG_BITS); if (k == 0) return 1; if (k > 0) return zn_mod_reduce (1UL << k, mod); return zn_mod_pow (zn_mod_divby2 (1, mod), -k, mod); } ulong zn_mod_pow (ulong x, long k, const zn_mod_t mod) { ZNP_ASSERT (k >= 0); // repeated squaring ulong prod = 1; ulong x_pow = x; for (; k; k >>= 1) { if (k & 1) prod = zn_mod_mul (prod, x_pow, mod); x_pow = zn_mod_mul (x_pow, x_pow, mod); } return prod; } ulong zn_mod_invert (ulong x, const zn_mod_t mod) { ZNP_ASSERT (x < mod->m); // for now just use GMP mpz_t a, m; mpz_init (a); mpz_set_ui (a, x); mpz_init (m); mpz_set_ui (m, mod->m); int success = mpz_invert (a, a, m); x = mpz_get_ui (a); mpz_clear (m); mpz_clear (a); return success ? x : 0; } // end of file **************************************************************** zn_poly-0.9.2/test/000077500000000000000000000000001360464557000142055ustar00rootroot00000000000000zn_poly-0.9.2/test/invert-test.c000066400000000000000000000051741360464557000166440ustar00rootroot00000000000000/* invert-test.c: test code for functions in invert.c Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "zn_poly_internal.h" /* Tests zn_array_invert() for a given series length and modulus. */ int testcase_zn_array_invert (size_t n, const zn_mod_t mod) { ulong* op = (ulong*) malloc (sizeof (ulong) * n); ulong* res = (ulong*) malloc (sizeof (ulong) * n); ulong* check = (ulong*) malloc (sizeof (ulong) * (2 * n - 1)); // make up random input poly size_t i; op[0] = 1; for (i = 1; i < n; i++) op[i] = random_ulong (mod->m); // compute inverse zn_array_invert (res, op, n, mod); // multiply by original series and check we get 1 ref_zn_array_mul (check, op, n, res, n, mod); int success = (check[0] == 1); for (i = 1; i < n; i++) success = success && (check[i] == 0); free (check); free (res); free (op); return success; } /* Tests zn_array_invert() on a range of problems. */ int test_zn_array_invert (int quick) { int success = 1; int b, trial; size_t n; zn_mod_t mod; // first try a dense range of "small" problems for (b = 2; b <= ULONG_BITS && success; b++) for (n = 1; n <= 60 && success; n++) for (trial = 0; trial < (quick ? 1 : 10) && success; trial++) { zn_mod_init (mod, random_modulus (b, 0)); success = success && testcase_zn_array_invert (n, mod); zn_mod_clear (mod); } // now try a few larger random problems for (b = 2; b <= ULONG_BITS && success; b += (quick ? random_ulong (3) + 1 : 1)) for (trial = 0; trial < (quick ? 1 : 5) && success; trial++) { zn_mod_init (mod, random_modulus (b, 0)); n = random_ulong (quick ? 2000 : 10000) + 1; success = success && testcase_zn_array_invert (n, mod); zn_mod_clear (mod); } return success; } // end of file **************************************************************** zn_poly-0.9.2/test/mpn_mulmid-test.c000066400000000000000000000137321360464557000174750ustar00rootroot00000000000000/* mpn_mulmid-test.c: test code for functions in mpn_mulmid.c Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "zn_poly_internal.h" #include /* Tests mpn_smp for given n1 >= n2 >= 1. */ int testcase_mpn_smp_basecase (size_t n1, size_t n2) { size_t n3 = n1 - n2 + 3; mp_limb_t* buf1 = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n1); mp_limb_t* buf2 = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n2); mp_limb_t* ref = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n3); mp_limb_t* res = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n3); // generate random inputs ZNP_mpn_random2 (buf1, n1); ZNP_mpn_random2 (buf2, n2); // compare target against reference implementation ZNP_mpn_smp_basecase (res, buf1, n1, buf2, n2); ref_mpn_smp (ref, buf1, n1, buf2, n2); int success = !mpn_cmp (ref, res, n3); free (res); free (ref); free (buf2); free (buf1); return success; } /* Tests mpn_smp for a range of n1, n2. */ int test_mpn_smp_basecase (int quick) { int success = 1; size_t n1, n2; ulong trial; for (n2 = 1; n2 <= 30 && success; n2++) for (n1 = n2; n1 <= 30 && success; n1++) for (trial = 0; trial < (quick ? 30 : 3000) && success; trial++) success = success && testcase_mpn_smp_basecase (n1, n2); return success; } /* Tests mpn_smp_kara for given n. */ #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS int testcase_mpn_smp_kara (size_t n) { mp_limb_t* buf1 = malloc (sizeof (mp_limb_t) * (2 * n - 1)); mp_limb_t* buf2 = malloc (sizeof (mp_limb_t) * n); mp_limb_t* ref = malloc (sizeof (mp_limb_t) * (n + 2)); mp_limb_t* res = malloc (sizeof (mp_limb_t) * (n + 2)); // generate random inputs ZNP_mpn_random2 (buf1, 2 * n - 1); ZNP_mpn_random2 (buf2, n); // compare target against reference implementation ZNP_mpn_smp_kara (res, buf1, buf2, n); ref_mpn_smp (ref, buf1, 2 * n - 1, buf2, n); int success = !zn_array_cmp ((ulong*) ref, (ulong*)res, n + 2); free (res); free (ref); free (buf2); free (buf1); return success; } #else #error Not nails-safe yet #endif /* Tests mpn_smp_kara for a range of n. */ int test_mpn_smp_kara (int quick) { int success = 1; size_t n; ulong trial; // first a dense range of small problems for (n = 2; n <= 30 && success; n++) for (trial = 0; trial < (quick ? 300 : 30000) && success; trial++) success = success && testcase_mpn_smp_kara (n); // now a few larger problems too for (trial = 0; trial < (quick ? 100 : 3000) && success; trial++) { if (ZNP_mpn_smp_kara_thresh == SIZE_MAX) n = random_ulong (100) + 2; else n = random_ulong (3 * ZNP_mpn_smp_kara_thresh) + 2; success = success && testcase_mpn_smp_kara (n); } return success; } #if GMP_NAIL_BITS == 0 && ULONG_BITS == GMP_NUMB_BITS int testcase_mpn_smp (size_t n1, size_t n2) { size_t n3 = n1 - n2 + 3; mp_limb_t* buf1 = malloc (sizeof (mp_limb_t) * n1); mp_limb_t* buf2 = malloc (sizeof (mp_limb_t) * n2); mp_limb_t* ref = malloc (sizeof (mp_limb_t) * n3); mp_limb_t* res = malloc (sizeof (mp_limb_t) * n3); // generate random inputs ZNP_mpn_random2 (buf1, n1); ZNP_mpn_random2 (buf2, n2); // temporarily lower the karatsuba threshold for more stringent testing unsigned long temp_thresh = ZNP_mpn_smp_kara_thresh; ZNP_mpn_smp_kara_thresh = 5; // compare target against reference implementation ZNP_mpn_smp (res, buf1, n1, buf2, n2); ref_mpn_smp (ref, buf1, n1, buf2, n2); int success = !zn_array_cmp ((ulong*) ref, (ulong*) res, n3); ZNP_mpn_smp_kara_thresh = temp_thresh; free (res); free (ref); free (buf2); free (buf1); return success; } #else #error Not nails-safe yet #endif /* Tests mpn_smp for a range of n1, n2. */ int test_mpn_smp (int quick) { int success = 1; size_t n1, n2; ulong trial; for (n2 = 1; n2 <= 30 && success; n2++) for (n1 = n2; n1 <= 30 && success; n1++) for (trial = 0; trial < (quick ? 30 : 3000) && success; trial++) success = success && testcase_mpn_smp (n1, n2); return success; } int testcase_mpn_mulmid (size_t n1, size_t n2) { size_t n3 = n1 - n2 + 3; mp_limb_t* buf1 = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n1); mp_limb_t* buf2 = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n2); mp_limb_t* ref = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n3); mp_limb_t* res = (mp_limb_t*) malloc (sizeof (mp_limb_t) * n3); // generate random inputs ZNP_mpn_random2 (buf1, n1); ZNP_mpn_random2 (buf2, n2); // compare target against reference implementation ZNP_mpn_mulmid (res, buf1, n1, buf2, n2); ref_mpn_mulmid (ref, buf1, n1, buf2, n2); int success = (n3 <= 4) || !mpn_cmp (ref + 2, res + 2, n3 - 4); free (res); free (ref); free (buf2); free (buf1); return success; } /* Tests mpn_mulmid for a range of n1, n2. */ int test_mpn_mulmid (int quick) { int success = 1; size_t n1, n2; ulong trial; for (n2 = 1; n2 <= 30 && success; n2++) for (n1 = n2; n1 <= 30 && success; n1++) for (trial = 0; trial < (quick ? 30 : 3000) && success; trial++) success = success && testcase_mpn_mulmid (n1, n2); return success; } // end of file **************************************************************** zn_poly-0.9.2/test/mul_fft-test.c000066400000000000000000000220111360464557000167560ustar00rootroot00000000000000/* mul_fft-test.c: test code for functions in mul_fft.c and mul_fft_dft.c Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "zn_poly_internal.h" /* Tests zn_array_mul_fft, for given lengths and modulus. If use_scale is set, zn_array_mul_fft() gets called with a random x (post-scaling factor), otherwise gets called with x == 1. If sqr == 1, tests squaring (n2 is ignored), otherwise ordinary multiplication. Returns 1 on success. */ int testcase_zn_array_mul_fft (size_t n1, size_t n2, int sqr, int use_scale, const zn_mod_t mod) { if (sqr) n2 = n1; ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n1); ulong* buf2 = sqr ? buf1 : (ulong*) malloc (sizeof (ulong) * n2); ulong* ref = (ulong*) malloc (sizeof (ulong) * (n1 + n2 - 1)); ulong* res = (ulong*) malloc (sizeof (ulong) * (n1 + n2 - 1)); // generate random polys size_t i; for (i = 0; i < n1; i++) buf1[i] = random_ulong (mod->m); if (!sqr) for (i = 0; i < n2; i++) buf2[i] = random_ulong (mod->m); ulong x = use_scale ? random_ulong (mod->m) : 1; // compare target implementation against reference implementation ref_zn_array_mul (ref, buf1, n1, buf2, n2, mod); ref_zn_array_scalar_mul (ref, ref, n1 + n2 - 1, x, mod); zn_array_mul_fft (res, buf1, n1, buf2, n2, x, mod); ulong y = zn_array_mul_fft_fudge (n1, n2, sqr, mod); ref_zn_array_scalar_mul (res, res, n1 + n2 - 1, y, mod); int success = !zn_array_cmp (ref, res, n1 + n2 - 1); free (res); free (ref); if (!sqr) free (buf2); free (buf1); return success; } /* tests zn_array_mul_fft() on a range of input cases */ int test_zn_array_mul_or_sqr_fft (int sqr, int quick) { int success = 1; int i, trial, use_scale; size_t n1, n2; zn_mod_t mod; // first try a dense range of "small" problems for (i = 0; i < num_test_bitsizes && success; i++) for (n2 = 1; n2 <= 50 && success; n2 += (quick ? 3 : 1)) for (n1 = n2; n1 <= 50 && (!sqr || n1 <= n2) && success; n1 += (quick ? 3 : 1)) for (use_scale = 0; use_scale <= 1 && success; use_scale++) for (trial = 0; trial < (quick ? 1 : 3) && success; trial++) { zn_mod_init (mod, random_modulus (test_bitsizes[i], 1)); success = success && testcase_zn_array_mul_fft (n1, n2, sqr, use_scale, mod); zn_mod_clear (mod); } // now try some random larger problems // and temporarily change the nussbaumer thresholds so we use that // code sometimes unsigned thresh; for (i = 0; i < num_test_bitsizes && success; i++) { unsigned b = test_bitsizes[i]; unsigned* c = sqr ? &(tuning_info[b].nuss_sqr_thresh) : &(tuning_info[b].nuss_mul_thresh); for (use_scale = 0; use_scale <= 1 && success; use_scale++) for (thresh = 2; thresh <= 8 && success; thresh += (quick ? 4 : 1)) { unsigned save_thresh = *c; *c = thresh; size_t t1 = random_ulong (quick ? 3000 : 10000) + 1; size_t t2 = sqr ? t1 : (random_ulong (quick ? 3000 : 10000) + 1); n1 = ZNP_MAX (t1, t2); n2 = ZNP_MIN (t1, t2); zn_mod_init (mod, random_modulus (b, 1)); success = success && testcase_zn_array_mul_fft (n1, n2, sqr, use_scale, mod); zn_mod_clear (mod); *c = save_thresh; } } return success; } int test_zn_array_mul_fft (int quick) { return test_zn_array_mul_or_sqr_fft (0, quick); } int test_zn_array_sqr_fft (int quick) { return test_zn_array_mul_or_sqr_fft (1, quick); } /* Tests zn_array_mul_fft_dft, for given lengths, lgT, modulus. Returns 1 on success. */ int testcase_zn_array_mul_fft_dft (size_t n1, size_t n2, unsigned lgT, const zn_mod_t mod) { ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n1); ulong* buf2 = (ulong*) malloc (sizeof (ulong) * n2); ulong* ref = (ulong*) malloc (sizeof (ulong) * (n1 + n2 - 1)); ulong* res = (ulong*) malloc (sizeof (ulong) * (n1 + n2 - 1)); // generate random polys size_t i; for (i = 0; i < n1; i++) buf1[i] = random_ulong (mod->m); for (i = 0; i < n2; i++) buf2[i] = random_ulong (mod->m); // compare target implementation against reference implementation ref_zn_array_mul (ref, buf1, n1, buf2, n2, mod); zn_array_mul_fft_dft (res, buf1, n1, buf2, n2, lgT, mod); int success = !zn_array_cmp (ref, res, n1 + n2 - 1); free (res); free (ref); free (buf2); free (buf1); return success; } /* Tests zn_array_mulmid_fft, for given n1, n2, modulus. If use_scale is set, zn_array_mulmid_fft() gets called with a random x (post-scaling factor), otherwise gets called with x == 1. Returns 1 on success. */ int testcase_zn_array_mulmid_fft (size_t n1, size_t n2, int use_scale, const zn_mod_t mod) { ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n1); ulong* buf2 = (ulong*) malloc (sizeof (ulong) * n2); ulong* ref = (ulong*) malloc (sizeof (ulong) * (n1 - n2 + 1)); ulong* res = (ulong*) malloc (sizeof (ulong) * (n1 - n2 + 1)); // generate random polys size_t i; for (i = 0; i < n1; i++) buf1[i] = random_ulong (mod->m); for (i = 0; i < n2; i++) buf2[i] = random_ulong (mod->m); ulong x = use_scale ? random_ulong (mod->m) : 1; // compare target implementation against reference implementation ref_zn_array_mulmid (ref, buf1, n1, buf2, n2, mod); ref_zn_array_scalar_mul (ref, ref, n1 - n2 + 1, x, mod); zn_array_mulmid_fft (res, buf1, n1, buf2, n2, x, mod); ulong y = zn_array_mulmid_fft_fudge (n1, n2, mod); ref_zn_array_scalar_mul (res, res, n1 - n2 + 1, y, mod); int success = !zn_array_cmp (ref, res, n1 - n2 + 1); free (res); free (ref); free (buf2); free (buf1); return success; } /* tests zn_array_mulmid_fft() on a range of input cases */ int test_zn_array_mulmid_fft (int quick) { int success = 1; int i, trial, use_scale; size_t n1, n2; zn_mod_t mod; // first try a dense range of "small" problems for (i = 0; i < num_test_bitsizes && success; i++) for (n2 = 1; n2 <= 50 && success; n2 += (quick ? 3 : 1)) for (n1 = n2; n1 <= 50 && success; n1 += (quick ? 3 : 1)) for (use_scale = 0; use_scale <= 1 && success; use_scale++) for (trial = 0; trial < (quick ? 1 : 3) && success; trial++) { zn_mod_init (mod, random_modulus (test_bitsizes[i], 1)); success = success && testcase_zn_array_mulmid_fft (n1, n2, use_scale, mod); zn_mod_clear (mod); } // now try some random larger problems // and temporarily change the nussbaumer thresholds so we use that // code sometimes ulong thresh; for (i = 0; i < num_test_bitsizes && success; i++) for (use_scale = 0; use_scale <= 1 && success; use_scale++) for (thresh = 2; thresh <= 8; thresh += (quick ? 4 : 1)) { unsigned b = test_bitsizes[i]; ulong save_thresh = tuning_info[b].nuss_mul_thresh; tuning_info[b].nuss_mul_thresh = thresh; size_t t1 = random_ulong (quick ? 3000 : 10000) + 1; size_t t2 = random_ulong (quick ? 3000 : 10000) + 1; n1 = ZNP_MAX (t1, t2); n2 = ZNP_MIN (t1, t2); zn_mod_init (mod, random_modulus (b, 1)); success = success && testcase_zn_array_mulmid_fft (n1, n2, use_scale, mod); zn_mod_clear (mod); tuning_info[b].nuss_mul_thresh = save_thresh; } return success; } /* tests zn_array_mul_fft_dft() on a range of input cases */ int test_zn_array_mul_fft_dft (int quick) { int success = 1; int i, trial; unsigned lgT; size_t n1, n2; zn_mod_t mod; for (i = 0; i < num_test_bitsizes && success; i++) for (n2 = 1; n2 <= 30 && success; n2 += (quick ? random_ulong (2) + 1 : 1)) for (n1 = n2; n1 <= 30 && success; n1 += (quick ? random_ulong (2) + 1 : 1)) for (lgT = 0; lgT < 5 && success; lgT++) for (trial = 0; trial < (quick ? 1 : 3) && success; trial++) { zn_mod_init (mod, random_modulus (test_bitsizes[i], 1)); success = success && testcase_zn_array_mul_fft_dft (n1, n2, lgT, mod); zn_mod_clear (mod); } return success; } // end of file **************************************************************** zn_poly-0.9.2/test/mul_ks-test.c000066400000000000000000000257261360464557000166340ustar00rootroot00000000000000/* mul_ks-test.c: test code for functions in mul_ks.c Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "zn_poly_internal.h" /* Tests zn_array_mul_KSk, for given lengths, reduction algorithm, modulus. 1 <= k <= 4 indicates which KS variant to call. sqr == 1 to test squaring (n2 is ignored). Returns 1 on success. */ int testcase_zn_array_mul_KS (int k, size_t n1, size_t n2, int sqr, int redc, const zn_mod_t mod) { // disallow REDC if modulus is even if (!(mod->m & 1)) redc = 0; if (sqr) n2 = n1; ulong* buf1 = (ulong*) malloc (sizeof(ulong) * n1); ulong* buf2 = sqr ? buf1 : (ulong*) malloc (sizeof(ulong) * n2); ulong* ref = (ulong*) malloc (sizeof(ulong) * (n1 + n2 - 1)); ulong* res = (ulong*) malloc (sizeof(ulong) * (n1 + n2 - 1)); // generate random polys size_t i; for (i = 0; i < n1; i++) buf1[i] = random_ulong (mod->m); if (!sqr) for (i = 0; i < n2; i++) buf2[i] = random_ulong (mod->m); // compare target implementation against reference implementation ref_zn_array_mul (ref, buf1, n1, buf2, n2, mod); switch (k) { case 1: zn_array_mul_KS1 (res, buf1, n1, buf2, n2, redc, mod); break; case 2: zn_array_mul_KS2 (res, buf1, n1, buf2, n2, redc, mod); break; case 3: zn_array_mul_KS3 (res, buf1, n1, buf2, n2, redc, mod); break; case 4: zn_array_mul_KS4 (res, buf1, n1, buf2, n2, redc, mod); break; default: printf ("oops!\n"); abort (); } if (redc) // correct for REDC reduction ref_zn_array_scalar_mul (res, res, n1 + n2 - 1, mod->m - mod->B, mod); int success = !zn_array_cmp (ref, res, n1 + n2 - 1); free (res); free (ref); if (!sqr) free (buf2); free (buf1); return success; } /* tests zn_array_mul_KSk() on a range of input cases, where 1 <= k <= 4 */ int test_zn_array_mul_KSk (unsigned k, int quick) { int success = 1; int b, trial, redc; size_t n1, n2, t1, t2; zn_mod_t mod; // first try a dense range of "small" problems for (b = 2; b <= ULONG_BITS && success; b++) for (n2 = 1; n2 <= 30 && success; n2 += (quick ? 5 : 1)) for (n1 = n2; n1 <= 30 && success; n1 += (quick ? 5 : 1)) for (redc = 0; redc < 2 && success; redc++) for (trial = 0; trial < (quick ? 1 : 10) && success; trial++) { zn_mod_init (mod, random_modulus (b, 0)); success = success && testcase_zn_array_mul_KS (k, n1, n2, 0, redc, mod); zn_mod_clear (mod); } // now try some random larger problems for (b = 2; b <= ULONG_BITS && success; b++) for (redc = 0; redc < 2 && success; redc++) for (trial = 0; trial < (quick ? 3 : 200) && success; trial++) { size_t t1 = random_ulong (quick ? 250 : 1000) + 1; size_t t2 = random_ulong (quick ? 250 : 1000) + 1; n1 = ZNP_MAX (t1, t2); n2 = ZNP_MIN (t1, t2); zn_mod_init (mod, random_modulus (b, 0)); success = success && testcase_zn_array_mul_KS (k, n1, n2, 0, redc, mod); zn_mod_clear (mod); } return success; } int test_zn_array_mul_KS1 (int quick) { return test_zn_array_mul_KSk (1, quick); } int test_zn_array_mul_KS2 (int quick) { return test_zn_array_mul_KSk (2, quick); } int test_zn_array_mul_KS3 (int quick) { return test_zn_array_mul_KSk (3, quick); } int test_zn_array_mul_KS4 (int quick) { return test_zn_array_mul_KSk (4, quick); } /* tests zn_array_mul_KSk() for squaring on a range of input cases, where 1 <= k <= 4 */ int test_zn_array_sqr_KSk (unsigned k, int quick) { int success = 1; int b, trial, redc; size_t n; zn_mod_t mod; // first try a dense range of "small" problems for (b = 2; b <= ULONG_BITS && success; b++) for (n = 1; n <= 30 && success; n += (quick ? 5 : 1)) for (redc = 0; redc < 2 && success; redc++) for (trial = 0; trial < (quick ? 1 : 10) && success; trial++) { zn_mod_init (mod, random_modulus (b, 0)); success = success && testcase_zn_array_mul_KS (k, n, n, 1, redc, mod); zn_mod_clear(mod); } // now try some random larger problems for (b = 2; b <= ULONG_BITS && success; b++) for (redc = 0; redc < 2 && success; redc++) for (trial = 0; trial < (quick ? 3 : 200) && success; trial++) { n = random_ulong (quick ? 250 : 1000) + 1; zn_mod_init (mod, random_modulus (b, 0)); success = success && testcase_zn_array_mul_KS (k, n, n, 1, redc, mod); zn_mod_clear (mod); } return success; } int test_zn_array_sqr_KS1 (int quick) { return test_zn_array_sqr_KSk (1, quick); } int test_zn_array_sqr_KS2 (int quick) { return test_zn_array_sqr_KSk (2, quick); } int test_zn_array_sqr_KS3 (int quick) { return test_zn_array_sqr_KSk (3, quick); } int test_zn_array_sqr_KS4 (int quick) { return test_zn_array_sqr_KSk (4, quick); } /* Tests zn_array_recover_reduce() for given n, b, reduction algorithm and modulus. Doesn't test the s parameter. Note: running time is quadratic in n, so don't make it too big. */ int testcase_zn_array_recover_reduce (size_t n, unsigned b, int redc, const zn_mod_t mod) { // disallow REDC if modulus is even if (!(mod->m & 1)) redc = 0; ZNP_ASSERT (b >= 1 && 2 * b <= 3 * ULONG_BITS); mpz_t* a; size_t i; a = (mpz_t*) malloc (sizeof (mpz_t) * n); for (i = 0; i < n; i++) mpz_init (a[i]); // c = 2^b - 1 mpz_t c; mpz_init (c); mpz_set_ui (c, 1); mpz_mul_2exp (c, c, b); mpz_sub_ui (c, c, 1); mpz_t hi, lo; mpz_init (hi); mpz_init (lo); mpz_t temp; mpz_init (temp); // a "small" integer, no more than c mpz_t small; mpz_init (small); mpz_set_ui (small, (b >= 2) ? 3 : 1); ZNP_ASSERT (mpz_cmp (small, c) <= 0); mpz_t sum1, sum2; mpz_init (sum1); mpz_init (sum2); // make up a list of a[i]'s for (i = 0; i < n; i++) { // make up low digit switch (random_ulong (3)) { case 0: // some uniform random digit mpz_urandomb (lo, randstate, b); break; case 1: // a value close to zero mpz_urandomm (lo, randstate, small); break; case 2: // a value close to the maximum // (anything up to and including 2^b - 1) mpz_urandomm (lo, randstate, small); mpz_sub (lo, c, lo); break; } // make up high digit switch (random_ulong (3)) { case 0: // some uniform random digit mpz_urandomm (hi, randstate, c); break; case 1: // a value close to zero mpz_urandomm (hi, randstate, small); break; case 2: // a value close to the maximum // (anything up to but NOT including 2^b - 1) mpz_urandomm (hi, randstate, small); mpz_sub (hi, c, hi); mpz_sub_ui (hi, hi, 1); break; } // put a[i] = hi*B + lo mpz_mul_2exp (a[i], hi, b); mpz_add (a[i], a[i], lo); } // construct the sums in forward and reverse directions // i.e. sum1 = a[0] + a[1]*B + ... + a[n-1]*B^(n-1) // sum2 = a[n-1] + a[n-2]*B + ... + a[0]*B^(n-1). for (i = 0; i < n; i++) { mpz_mul_2exp (sum1, sum1, b); mpz_add (sum1, sum1, a[n - 1 - i]); mpz_mul_2exp (sum2, sum2, b); mpz_add (sum2, sum2, a[i]); } // decompose both sums into sequence of (n+1) base-B digits unsigned w = CEIL_DIV (b, ULONG_BITS); ZNP_ASSERT (w <= 2); ulong* d1 = (ulong*) malloc (sizeof (ulong) * w * (n + 1)); ulong* d2 = (ulong*) malloc (sizeof (ulong) * w * (n + 1)); if (w == 1) { for (i = 0; i <= n; i++) { mpz_tdiv_r_2exp (temp, sum1, b); d1[i] = mpz_get_ui (temp); mpz_tdiv_q_2exp (sum1, sum1, b); mpz_tdiv_r_2exp (temp, sum2, b); d2[i] = mpz_get_ui (temp); mpz_tdiv_q_2exp (sum2, sum2, b); } } else { for (i = 0; i <= n; i++) { mpz_tdiv_r_2exp (temp, sum1, ULONG_BITS); d1[2 * i] = mpz_get_ui (temp); mpz_tdiv_q_2exp (sum1, sum1, ULONG_BITS); mpz_tdiv_r_2exp (temp, sum1, b - ULONG_BITS); d1[2 * i + 1] = mpz_get_ui (temp); mpz_tdiv_q_2exp (sum1, sum1, b - ULONG_BITS); mpz_tdiv_r_2exp (temp, sum2, ULONG_BITS); d2[2 * i] = mpz_get_ui (temp); mpz_tdiv_q_2exp (sum2, sum2, ULONG_BITS); mpz_tdiv_r_2exp (temp, sum2, b - ULONG_BITS); d2[2 * i + 1] = mpz_get_ui (temp); mpz_tdiv_q_2exp (sum2, sum2, b - ULONG_BITS); } } // shouldn't be any bits left ZNP_ASSERT (mpz_cmp_ui (sum1, 0) == 0); ZNP_ASSERT (mpz_cmp_ui (sum2, 0) == 0); // see if zn_array_recover_reduce() returns the original inputs (mod m) ulong* res = (ulong*) malloc (sizeof (ulong) * n); zn_array_recover_reduce (res, 1, d1, d2, n, b, redc, mod); int success = 1; for (i = 0; i < n; i++) { if (redc) // correct for REDC reduction mpz_mul_ui (temp, a[i], mod->m - mod->B); else mpz_set (temp, a[i]); mpz_mod_ui (temp, a[i], mod->m); success = success && (mpz_get_ui (temp) == res[i]); } // clean up free (res); free (d2); free (d1); mpz_clear (temp); mpz_clear (sum2); mpz_clear (sum1); mpz_clear (lo); mpz_clear (hi); mpz_clear (small); mpz_clear (c); for (i = 0; i < n; i++) mpz_clear (a[i]); free (a); return 1; } /* Tests zn_array_recover_reduce() on a range of small problems. */ int test_zn_array_recover_reduce (int quick) { int success = 1; int b, trial, redc; size_t n; zn_mod_t mod; for (b = 1; 2 * b <= 3 * ULONG_BITS && success; b++) for (n = 1; n <= 15 && success; n++) for (redc = 0; redc < 2 && success; redc++) for (trial = 0; trial < (quick ? 10 : 200) && success; trial++) { zn_mod_init (mod, random_modulus (random_ulong (ULONG_BITS - 1) + 2, 0)); success = success && testcase_zn_array_recover_reduce (n, b, redc, mod); zn_mod_clear (mod); } return success; } // end of file **************************************************************** zn_poly-0.9.2/test/mulmid_ks-test.c000066400000000000000000000101641360464557000173140ustar00rootroot00000000000000/* mulmid_ks-test.c: test code for functions in mulmid_ks.c Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "zn_poly_internal.h" /* Tests zn_array_mulmid_KSk, for given lengths, reduction algorithm, modulus. 1 <= k <= 4 indicates which KS variant to call. Returns 1 on success. */ int testcase_zn_array_mulmid_KS (int k, size_t n1, size_t n2, int redc, const zn_mod_t mod) { // disallow REDC if modulus is even if (!(mod->m & 1)) redc = 0; ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n1); ulong* buf2 = (ulong*) malloc (sizeof (ulong) * n2); ulong* ref = (ulong*) malloc (sizeof (ulong) * (n1 - n2 + 1)); ulong* res = (ulong*) malloc (sizeof (ulong) * (n1 - n2 + 1)); // generate random polys size_t i; for (i = 0; i < n1; i++) buf1[i] = random_ulong (mod->m); for (i = 0; i < n2; i++) buf2[i] = random_ulong (mod->m); // compare target implementation against reference implementation ref_zn_array_mulmid (ref, buf1, n1, buf2, n2, mod); switch (k) { case 1: zn_array_mulmid_KS1 (res, buf1, n1, buf2, n2, redc, mod); break; case 2: zn_array_mulmid_KS2 (res, buf1, n1, buf2, n2, redc, mod); break; case 3: zn_array_mulmid_KS3 (res, buf1, n1, buf2, n2, redc, mod); break; case 4: zn_array_mulmid_KS4 (res, buf1, n1, buf2, n2, redc, mod); break; default: printf ("oops!\n"); abort (); } if (redc) // correct for REDC reduction ref_zn_array_scalar_mul (res, res, n1 - n2 + 1, mod->m - mod->B, mod); int success = !zn_array_cmp (ref, res, n1 - n2 + 1); free (res); free (ref); free (buf2); free (buf1); return success; } /* tests zn_array_mulmid_KSk() on a range of input cases, where 1 <= k <= 4 */ int test_zn_array_mulmid_KSk (unsigned k, int quick) { int success = 1; int b, trial, redc; size_t n1, n2; zn_mod_t mod; // first try a dense range of "small" problems for (b = 2; b <= ULONG_BITS && success; b++) for (n2 = 1; n2 <= 30 && success; n2 += (quick ? 5 : 1)) for (n1 = n2; n1 <= 30 && success; n1 += (quick ? 5 : 1)) for (redc = 0; redc < 2 && success; redc++) for (trial = 0; trial < (quick ? 1 : 10) && success; trial++) { zn_mod_init (mod, random_modulus (b, 0)); success = success && testcase_zn_array_mulmid_KS (k, n1, n2, redc, mod); zn_mod_clear(mod); } // now try some random larger problems for (b = 2; b <= ULONG_BITS && success; b++) for (redc = 0; redc < 2 && success; redc++) for (trial = 0; trial < (quick ? 3 : 200) && success; trial++) { size_t t1 = random_ulong (quick ? 250 : 1000) + 1; size_t t2 = random_ulong (quick ? 250 : 1000) + 1; n1 = ZNP_MAX (t1, t2); n2 = ZNP_MIN (t1, t2); zn_mod_init (mod, random_modulus (b, 0)); success = success && testcase_zn_array_mulmid_KS (k, n1, n2, redc, mod); zn_mod_clear(mod); } return success; } int test_zn_array_mulmid_KS1 (int quick) { return test_zn_array_mulmid_KSk (1, quick); } int test_zn_array_mulmid_KS2 (int quick) { return test_zn_array_mulmid_KSk (2, quick); } int test_zn_array_mulmid_KS3 (int quick) { return test_zn_array_mulmid_KSk (3, quick); } int test_zn_array_mulmid_KS4 (int quick) { return test_zn_array_mulmid_KSk (4, quick); } // end of file **************************************************************** zn_poly-0.9.2/test/nuss-test.c000066400000000000000000000053651360464557000163270ustar00rootroot00000000000000/* nuss-test.c: test code for functions in nussbaumer.c Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "zn_poly_internal.h" /* Tests nuss_mul, for given lgL and modulus. sqr == 1 means to test squaring. Returns 1 on success. */ int testcase_nuss_mul (unsigned lgL, int sqr, const zn_mod_t mod) { ulong n = 1UL << lgL; ulong* buf1 = (ulong*) malloc (sizeof (ulong) * n); ulong* buf2 = sqr ? buf1 : (ulong*) malloc (sizeof (ulong) * n); ulong* ref = (ulong*) malloc (sizeof (ulong) * n); ulong* res = (ulong*) malloc (sizeof (ulong) * n); // generate random polys ulong i; for (i = 0; i < n; i++) buf1[i] = random_ulong (mod->m); if (!sqr) for (i = 0; i < n; i++) buf2[i] = random_ulong (mod->m); // allocate scratch space for nuss_mul pmfvec_t vec1, vec2; pmfvec_init_nuss (vec1, lgL, mod); pmfvec_init_nuss (vec2, lgL, mod); // compare target implementation against reference implementation ref_zn_array_negamul (ref, buf1, buf2, n, mod); nuss_mul (res, buf1, buf2, vec1, vec2); ulong x = nuss_mul_fudge (lgL, sqr, mod); ref_zn_array_scalar_mul (res, res, n, x, mod); int success = !zn_array_cmp (ref, res, n); pmfvec_clear (vec2); pmfvec_clear (vec1); free (res); free (ref); if (!sqr) free (buf2); free (buf1); return success; } /* tests nuss_mul() on a range of input cases (multiplication and squaring) */ int test_nuss_mul (int quick) { int success = 1; int i, trial; unsigned lgL; zn_mod_t mod; for (i = 0; i < num_test_bitsizes; i++) for (lgL = 2; lgL <= (quick ? 11 : 13) && success; lgL++) for (trial = 0; trial < (quick ? 1 : 5) && success; trial++) { zn_mod_init (mod, random_modulus (test_bitsizes[i], 1)); success = success && testcase_nuss_mul (lgL, 0, mod); success = success && testcase_nuss_mul (lgL, 1, mod); zn_mod_clear (mod); } return success; } // end of file **************************************************************** zn_poly-0.9.2/test/pack-test.c000066400000000000000000000154731360464557000162560ustar00rootroot00000000000000/* pack-test.c: test code for functions in pack.c Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "zn_poly_internal.h" /* Helper function for ref_zn_array_pack(). Sets x = 2^k * (op[0] + op[1]*2^b + ... + op[n-1]*2^((n-1)*b)). Running time is soft-linear in output length. */ void ref_zn_array_pack_helper (mpz_t x, const ulong* op, size_t n, unsigned b, unsigned k) { ZNP_ASSERT (n >= 1); if (n == 1) { // base case mpz_set_ui (x, op[0]); mpz_mul_2exp (x, x, k); } else { // recursively split into top and bottom halves mpz_t y; mpz_init (y); ref_zn_array_pack_helper (x, op, n / 2, b, k); ref_zn_array_pack_helper (y, op + n / 2, n - n / 2, b, 0); mpz_mul_2exp (y, y, (n / 2) * b + k); mpz_add (x, x, y); mpz_clear (y); } } /* Reference implementation of zn_array_pack(). (doesn't take into account the s or r parameters) */ void ref_zn_array_pack (mp_limb_t* res, const ulong* op, size_t n, unsigned b, unsigned k) { mpz_t x; mpz_init (x); ref_zn_array_pack_helper (x, op, n, b, k); mpz_to_mpn (res, CEIL_DIV (n * b + k, GMP_NUMB_BITS), x); mpz_clear (x); } /* Helper function for ref_zn_array_unpack(). Inverse operation of ref_zn_array_pack_helper(); each output coefficient occupies ceil(b / ULONG_BITS) ulongs. Running time is soft-linear in output length. */ void ref_zn_array_unpack_helper (ulong* res, const mpz_t op, size_t n, unsigned b, unsigned k) { ZNP_ASSERT (n >= 1); ZNP_ASSERT (mpz_sizeinbase (op, 2) <= n * b + k); unsigned w = CEIL_DIV (b, ULONG_BITS); mpz_t y; mpz_init (y); if (n == 1) { // base case unsigned i; mpz_tdiv_q_2exp (y, op, k); for (i = 0; i < w; i++) { res[i] = mpz_get_ui (y); mpz_tdiv_q_2exp (y, y, ULONG_BITS); } } else { // recursively split into top and bottom halves mpz_tdiv_q_2exp (y, op, (n / 2) * b + k); ref_zn_array_unpack_helper (res + w * (n / 2), y, n - n / 2, b, 0); mpz_tdiv_r_2exp (y, op, (n / 2) * b + k); ref_zn_array_unpack_helper (res, y, n / 2, b, k); } mpz_clear (y); } /* Reference implementation of zn_array_unpack(). */ void ref_zn_array_unpack (ulong* res, const mp_limb_t* op, size_t n, unsigned b, unsigned k) { mpz_t x; mpz_init (x); mpn_to_mpz (x, op, CEIL_DIV (n * b + k, GMP_NUMB_BITS)); ref_zn_array_unpack_helper (res, x, n, b, k); mpz_clear (x); } /* tests zn_array_pack() once for given n, b, k */ int testcase_zn_array_pack (size_t n, unsigned b, unsigned k) { ZNP_ASSERT (b >= 1); ZNP_ASSERT (n >= 1); int success = 1; ulong* in = (ulong*) malloc (sizeof (ulong) * n); size_t size = CEIL_DIV (n * b + k, GMP_NUMB_BITS); mp_limb_t* res = (mp_limb_t*) malloc (sizeof (mp_limb_t) * (size + 2)); mp_limb_t* ref = (mp_limb_t*) malloc (sizeof (mp_limb_t) * (size + 2)); // sentries to check buffer overflow res[0] = res[size + 1] = ref[0] = ref[size + 1] = 0x1234; // generate random data: at most b bits per input coefficient, possibly less unsigned rand_bits = (b >= ULONG_BITS) ? ULONG_BITS : b; rand_bits = random_ulong (rand_bits) + 1; ulong max = (rand_bits == ULONG_BITS) ? ((ulong)(-1)) : ((1UL << rand_bits) - 1); size_t i; for (i = 0; i < n; i++) in[i] = random_ulong (max); // run target and reference implementation zn_array_pack (res + 1, in, n, 1, b, k, 0); ref_zn_array_pack (ref + 1, in, n, b, k); // check sentries success = success && (res[0] == 0x1234); success = success && (ref[0] == 0x1234); success = success && (res[size + 1] == 0x1234); success = success && (ref[size + 1] == 0x1234); // check correct result success = success && (mpn_cmp (res + 1, ref + 1, size) == 0); free (ref); free (res); free (in); return success; } /* tests zn_array_pack() on a range of input cases */ int test_zn_array_pack (int quick) { int success = 1; unsigned b, k; size_t n; for (b = 1; b < 3 * ULONG_BITS && success; b++) for (n = 1; n < (quick ? 100 : 200) && success; n += (quick ? (n < 5 ? 1 : 13) : 1)) for (k = 0; k < 160; k += 20) success = success && testcase_zn_array_pack (n, b, k); return success; } /* tests zn_array_unpack() once for given n, b, k */ int testcase_zn_array_unpack (size_t n, unsigned b, unsigned k) { size_t buf_size = CEIL_DIV (n * b + k, GMP_NUMB_BITS); size_t size = n * CEIL_DIV (b, ULONG_BITS); mp_limb_t* buf = (mp_limb_t*) malloc (sizeof (mp_limb_t) * buf_size); ulong* res = (ulong*) malloc (sizeof (ulong) * (size + 2)); ulong* ref = (ulong*) malloc (sizeof (ulong) * (size + 2)); // sentries to check buffer overflow res[0] = res[size + 1] = ref[0] = ref[size + 1] = 0x1234; // generate random data mpz_t x; mpz_init (x); mpz_urandomb (x, randstate, n * b); mpz_mul_2exp (x, x, k); mpz_to_mpn (buf, buf_size, x); mpz_clear (x); // run target and reference implementation zn_array_unpack (res + 1, buf, n, b, k); ref_zn_array_unpack (ref + 1, buf, n, b, k); int success = 1; // check sentries success = success && (res[0] == 0x1234); success = success && (ref[0] == 0x1234); success = success && (res[size + 1] == 0x1234); success = success && (ref[size + 1] == 0x1234); // check correct result success = success && (zn_array_cmp (res + 1, ref + 1, size) == 0); free (ref); free (res); free (buf); return success; } /* tests zn_array_unpack() on a range of input cases */ int test_zn_array_unpack (int quick) { int success = 1; unsigned b, k; size_t n; for (b = 1; b < 3 * ULONG_BITS && success; b++) for (n = 1; n < (quick ? 100 : 200) && success; n += (quick ? (n < 5 ? 1 : 13) : 1)) for (k = 0; k < 160; k += 19) success = success && testcase_zn_array_unpack (n, b, k); return success; } // end of file **************************************************************** zn_poly-0.9.2/test/pmfvec_fft-test.c000066400000000000000000000450221360464557000174500ustar00rootroot00000000000000/* pmfvec_fft-test.c: test code for functions in pmfvec_fft.c Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "zn_poly_internal.h" /* If lgT == 0, this tests pmfvec_fft_dc. If lgT > 0, it tests pmfvec_fft_huge. */ int testcase_pmfvec_fft_dc_or_huge (unsigned lgK, unsigned lgM, unsigned lgT, ulong n, ulong z, ulong t, const zn_mod_t mod) { pmfvec_t A, B; ulong M = 1UL << lgM; ulong K = 1UL << lgK; ptrdiff_t skip = M + 1; ulong i; pmfvec_init (A, lgK, skip, lgM, mod); pmfvec_init (B, lgK, skip, lgM, mod); // create random a_i's, with zero padding for (i = z; i < K; i++) pmf_zero (A->data + i * skip, M); for (i = z; i < K; i++) A->data[i * skip] = random_ulong (2 * M); for (i = 0; i < z; i++) pmf_rand (A->data + i * skip, M, mod); // run FFT using simple iterative algorithm pmfvec_set (B, A); pmfvec_fft_basecase (B, t); // make sure truncated FFT has to deal with random crap for (i = z; i < K; i++) pmf_rand (A->data + i * skip, M, mod); // try truncated FFT if (lgT > 0) pmfvec_fft_huge (A, lgT, n, z, t); else pmfvec_fft_dc (A, n, z, t); // compare results int success = 1; for (i = 0; i < n; i++) success = success && !pmf_cmp (A->data + i * skip, B->data + i * skip, M, mod); pmfvec_clear (B); pmfvec_clear (A); return success; } /* Tests pmfvec_fft_dc (if huge == 0) or pmfvec_fft_huge (if huge == 1) */ int test_pmfvec_fft_dc_or_huge (int huge, int quick) { int success = 1; int i; unsigned lgK, lgM, lgT; ulong z, n, t; zn_mod_t mod; for (i = 0; i < num_test_bitsizes && success; i++) for (lgK = 0; lgK < 5 && success; lgK++) for (lgT = (huge ? 1 : 0); lgT < (huge ? lgK : 1) && success; lgT++) for (lgM = lgK ? (lgK - 1) : 0; lgM < lgK + (quick ? 1 : 3) && success; lgM++) { ulong K = 1UL << lgK; ulong M = 1UL << lgM; for (t = 0; t < ZNP_MIN (2 * M / K, quick ? 2 : 1000) && success; t++) for (n = 1; n <= K && success; n++) for (z = 1; z <= K && success; z++) { zn_mod_init (mod, random_modulus (test_bitsizes[i], 1)); success = success && testcase_pmfvec_fft_dc_or_huge (lgK, lgM, lgT, n, z, t, mod); zn_mod_clear (mod); } } return success; } int test_pmfvec_fft_dc (int quick) { return test_pmfvec_fft_dc_or_huge (0, quick); } int test_pmfvec_fft_huge (int quick) { return test_pmfvec_fft_dc_or_huge (1, quick); } /* If lgT == 0, this tests pmfvec_ifft_dc. If lgT > 0, it tests pmfvec_ifft_huge. */ int testcase_pmfvec_ifft_dc_or_huge (unsigned lgK, unsigned lgM, unsigned lgT, ulong n, int fwd, ulong z, ulong t, const zn_mod_t mod) { pmfvec_t A, B, C; ulong M = 1UL << lgM; ulong K = 1UL << lgK; ulong x = zn_mod_reduce (K, mod); ptrdiff_t skip = M + 1; ulong i; pmfvec_init (A, lgK, skip, lgM, mod); pmfvec_init (B, lgK, skip, lgM, mod); pmfvec_init (C, lgK, skip, lgM, mod); // create random a_i's, with zero padding for (i = z; i < K; i++) pmf_zero (A->data + i * skip, M); for (i = z; i < K; i++) A->data[i * skip] = random_ulong (2 * M); for (i = 0; i < z; i++) pmf_rand (A->data + i * skip, M, mod); // run FFT pmfvec_set (B, A); pmfvec_fft (B, K, K, t); pmfvec_set (C, B); // fill in missing data, plus junk where the implied zeroes should be for (i = n; i < z; i++) { pmf_set (C->data + i * skip, A->data + i * skip, M); pmf_scalar_mul (C->data + i * skip, M, x, mod); } for (i = z; i < K; i++) pmf_rand (C->data + i * skip, M, mod); // try IFFT if (lgT > 0) pmfvec_ifft_huge (C, lgT, n, fwd, z, t); else pmfvec_ifft_dc (C, n, fwd, z, t); // compare results int success = 1; for (i = 0; i < n; i++) pmf_scalar_mul (A->data + i * skip, M, x, mod); for (i = 0; i < n; i++) success = success && !pmf_cmp (C->data + i * skip, A->data + i * skip, M, mod); if (fwd) success = success && !pmf_cmp (C->data + i * skip, B->data + i * skip, M, mod); pmfvec_clear (C); pmfvec_clear (B); pmfvec_clear (A); return success; } /* Tests pmfvec_ifft_dc (if huge == 0) or pmfvec_ifft_huge (if huge == 1) */ int test_pmfvec_ifft_dc_or_huge (int huge, int quick) { int success = 1; int i; unsigned lgK, lgM, lgT; ulong z, n, t; int fwd; zn_mod_t mod; for (i = 0; i < num_test_bitsizes && success; i++) for (lgK = 0; lgK < 5 && success; lgK++) for (lgT = (huge ? 1 : 0); lgT < (huge ? lgK : 1) && success; lgT++) for (lgM = lgK ? (lgK - 1) : 0; lgM < lgK + (quick ? 1 : 3) && success; lgM++) { ulong K = 1UL << lgK; ulong M = 1UL << lgM; for (t = 0; t < ZNP_MIN (2 * M / K, quick ? 2 : 1000) && success; t++) for (z = 1; z <= K && success; z++) for (fwd = 0; fwd < 2 && success; fwd++) for (n = 1 - fwd; n <= K - fwd && n <= z && success; n++) { zn_mod_init (mod, random_modulus (test_bitsizes[i], 1)); success = success && testcase_pmfvec_ifft_dc_or_huge (lgK, lgM, lgT, n, fwd, z, t, mod); zn_mod_clear (mod); } } return success; } int test_pmfvec_ifft_dc (int quick) { return test_pmfvec_ifft_dc_or_huge (0, quick); } int test_pmfvec_ifft_huge (int quick) { return test_pmfvec_ifft_dc_or_huge (1, quick); } int testcase_pmfvec_tpfft_dc (unsigned lgK, unsigned lgM, ulong n, ulong z, ulong t, const zn_mod_t mod) { int success = 1; ulong M = 1UL << lgM; ulong K = 1UL << lgK; ulong i, j; ptrdiff_t skip = M + 1; // =================================== // first check linearity, i.e. that ax + by gets mapped to the right thing { pmfvec_t X, Y, A, B, TX, TY; pmfvec_init (X, lgK, skip, lgM, mod); pmfvec_init (Y, lgK, skip, lgM, mod); pmfvec_init (A, lgK, skip, lgM, mod); pmfvec_init (B, lgK, skip, lgM, mod); pmfvec_init (TX, lgK, skip, lgM, mod); pmfvec_init (TY, lgK, skip, lgM, mod); // generate random X, Y, A, B for (i = 0; i < K; i++) { pmf_rand (X->data + i * skip, M, mod); pmf_rand (Y->data + i * skip, M, mod); } pmf_rand (A->data, M, mod); pmf_rand (B->data, M, mod); for (i = 1; i < K; i++) { pmf_set (A->data + i * A->skip, A->data, M); pmf_set (B->data + i * B->skip, B->data, M); } // transform X and Y (after throwing in random ignorable crap) pmfvec_set (TX, X); pmfvec_set (TY, Y); for (i = n; i < K; i++) { pmf_rand (TX->data + i * skip, M, mod); pmf_rand (TY->data + i * skip, M, mod); } pmfvec_tpfft_dc (TX, n, z, t); pmfvec_tpfft_dc (TY, n, z, t); // form linear combination of TX and TY pmfvec_mul (TX, TX, A, z, 0); pmfvec_mul (TY, TY, B, z, 0); for (i = 0; i < z; i++) pmf_add (TX->data + TX->skip * i, TY->data + TY->skip * i, M, mod); // form linear combination of X and Y pmfvec_mul (X, X, A, n, 0); pmfvec_mul (Y, Y, B, n, 0); for (i = 0; i < n; i++) pmf_add (X->data + X->skip * i, Y->data + Y->skip * i, M, mod); // transform linear combination of X and Y pmfvec_tpfft_dc (X, n, z, t); // compare results for (i = 0; i < z; i++) success = success && !pmf_cmp (X->data + X->skip * i, TX->data + TX->skip * i, M, mod); pmfvec_clear (X); pmfvec_clear (Y); pmfvec_clear (TX); pmfvec_clear (TY); pmfvec_clear (A); pmfvec_clear (B); } // =================================== // now check that the matrix of the transposed FFT is really the transpose // of the matrix of the ordinary FFT { pmfvec_t* X = (pmfvec_t*) malloc (z * sizeof(pmfvec_t)); for (i = 0; i < z; i++) pmfvec_init (X[i], lgK, skip, lgM, mod); pmfvec_t* Y = (pmfvec_t*) malloc (n * sizeof(pmfvec_t)); for (i = 0; i < n; i++) pmfvec_init (Y[i], lgK, skip, lgM, mod); // compute images of basis vectors under FFT for (i = 0; i < z; i++) for (j = 0; j < z; j++) { pmf_zero (X[i]->data + j * skip, M); X[i]->data[j * skip + 1] = (i == j); } for (i = 0; i < z; i++) pmfvec_fft (X[i], n, z, t); // compute images of basis vectors under transposed FFT for (i = 0; i < n; i++) for (j = 0; j < n; j++) { pmf_zero (Y[i]->data + j * skip, M); Y[i]->data[j * skip + 1] = (i == j); } for (i = 0; i < n; i++) pmfvec_tpfft (Y[i], n, z, t); // check that they are transposes of each other for (i = 0; i < z; i++) for (j = 0; j < n; j++) success = success && !pmf_cmp (X[i]->data + j * skip, Y[j]->data + i * skip, M, mod); for (i = 0; i < z; i++) pmfvec_clear (X[i]); for (i = 0; i < n; i++) pmfvec_clear (Y[i]); free (Y); free (X); } return success; } int testcase_pmfvec_tpfft_huge (unsigned lgK, unsigned lgM, unsigned lgT, ulong n, ulong z, ulong t, const zn_mod_t mod) { pmfvec_t A, B; ulong M = 1UL << lgM; ulong K = 1UL << lgK; ulong i; ptrdiff_t skip = M + 1; pmfvec_init (A, lgK, skip, lgM, mod); pmfvec_init (B, lgK, skip, lgM, mod); // create random input for (i = 0; i < K; i++) pmf_rand (A->data + i * skip, M, mod); // make a copy pmfvec_set (B, A); // put random crap in B to check that it's ignored for (i = n; i < K; i++) pmf_rand (B->data + i * skip, M, mod); // run transposed FFTs using huge and dc algorithms pmfvec_tpfft_dc (A, n, z, t); pmfvec_tpfft_huge (B, lgT, n, z, t); // compare results int success = 1; for (i = 0; i < z; i++) success = success && !pmf_cmp (B->data + i * skip, A->data + i * skip, M, mod); pmfvec_clear (B); pmfvec_clear (A); return success; } /* Tests pmfvec_tpfft_dc (if huge == 0) or pmfvec_tpfft_huge (if huge == 1) */ int test_pmfvec_tpfft_dc_or_huge (int huge, int quick) { int success = 1; int i; unsigned lgK, lgM, lgT; ulong z, n, t; zn_mod_t mod; for (i = 0; i < num_test_bitsizes && success; i++) for (lgK = 0; lgK < 5 && success; lgK++) for (lgT = (huge ? 1 : 0); lgT < (huge ? lgK : 1) && success; lgT++) for (lgM = lgK ? (lgK - 1) : 0; lgM < lgK + (quick ? 1 : 3) && success; lgM++) { ulong K = 1UL << lgK; ulong M = 1UL << lgM; for (t = 0; t < ZNP_MIN (2 * M / K, quick ? 2 : 1000) && success; t++) for (n = 1; n <= K && success; n++) for (z = 1; z <= K && success; z++) { zn_mod_init (mod, random_modulus (test_bitsizes[i], 1)); success = success && (huge ? testcase_pmfvec_tpfft_huge (lgK, lgM, lgT, n, z, t, mod) : testcase_pmfvec_tpfft_dc (lgK, lgM, n, z, t, mod)); zn_mod_clear (mod); } } return success; } int test_pmfvec_tpfft_dc (int quick) { return test_pmfvec_tpfft_dc_or_huge (0, quick); } int test_pmfvec_tpfft_huge (int quick) { return test_pmfvec_tpfft_dc_or_huge (1, quick); } int testcase_pmfvec_tpifft_dc (unsigned lgK, unsigned lgM, ulong n, int fwd, ulong z, ulong t, const zn_mod_t mod) { int success = 1; ulong M = 1UL << lgM; ulong K = 1UL << lgK; ulong i, j; ptrdiff_t skip = M + 1; // =================================== // first check linearity, i.e. that ax + by gets mapped to the right thing { pmfvec_t X, Y, A, B, TX, TY; pmfvec_init (X, lgK, skip, lgM, mod); pmfvec_init (Y, lgK, skip, lgM, mod); pmfvec_init (A, lgK, skip, lgM, mod); pmfvec_init (B, lgK, skip, lgM, mod); pmfvec_init (TX, lgK, skip, lgM, mod); pmfvec_init (TY, lgK, skip, lgM, mod); // generate random X, Y, A, B for (i = 0; i < K; i++) { pmf_rand (X->data + i * skip, M, mod); pmf_rand (Y->data + i * skip, M, mod); } pmf_rand (A->data, M, mod); pmf_rand (B->data, M, mod); for (i = 1; i < K; i++) { pmf_set (A->data + i * A->skip, A->data, M); pmf_set (B->data + i * B->skip, B->data, M); } // transform X and Y (after throwing in random ignorable crap) pmfvec_set (TX, X); pmfvec_set (TY, Y); for (i = n + fwd; i < K; i++) { pmf_rand (TX->data + i * skip, M, mod); pmf_rand (TY->data + i * skip, M, mod); } pmfvec_tpifft_dc (TX, n, fwd, z, t); pmfvec_tpifft_dc (TY, n, fwd, z, t); // form linear combination of TX and TY pmfvec_mul (TX, TX, A, z, 0); pmfvec_mul (TY, TY, B, z, 0); for (i = 0; i < z; i++) pmf_add (TX->data + TX->skip * i, TY->data + TY->skip * i, M, mod); // form linear combination of X and Y pmfvec_mul (X, X, A, n + fwd, 0); pmfvec_mul (Y, Y, B, n + fwd, 0); for (i = 0; i < n + fwd; i++) pmf_add (X->data + X->skip * i, Y->data + Y->skip * i, M, mod); // transform linear combination of X and Y pmfvec_tpifft_dc (X, n, fwd, z, t); // compare results for (i = 0; i < z; i++) success = success && !pmf_cmp (X->data + X->skip * i, TX->data + TX->skip * i, M, mod); pmfvec_clear (X); pmfvec_clear (Y); pmfvec_clear (TX); pmfvec_clear (TY); pmfvec_clear (A); pmfvec_clear (B); } // =================================== // now check that the matrix of the transposed IFFT is really the transpose // of the matrix of the ordinary IFFT { pmfvec_t* X = (pmfvec_t*) malloc (z * sizeof (pmfvec_t)); for (i = 0; i < z; i++) pmfvec_init (X[i], lgK, skip, lgM, mod); pmfvec_t* Y = (pmfvec_t*) malloc ((n + fwd) * sizeof (pmfvec_t)); for (i = 0; i < n + fwd; i++) pmfvec_init (Y[i], lgK, skip, lgM, mod); // compute images of basis vectors under FFT for (i = 0; i < z; i++) for (j = 0; j < z; j++) { pmf_zero (X[i]->data + j * skip, M); X[i]->data[j * skip + 1] = (i == j); } for (i = 0; i < z; i++) pmfvec_ifft (X[i], n, fwd, z, t); // compute images of basis vectors under transposed FFT for (i = 0; i < n + fwd; i++) for (j = 0; j < n + fwd; j++) { pmf_zero (Y[i]->data + j * skip, M); Y[i]->data[j * skip + 1] = (i == j); } for (i = 0; i < n + fwd; i++) pmfvec_tpifft (Y[i], n, fwd, z, t); // check that they are transposes of each other for (i = 0; i < z; i++) for (j = 0; j < n + fwd; j++) success = success && !pmf_cmp (X[i]->data + j * skip, Y[j]->data + i * skip, M, mod); for (i = 0; i < z; i++) pmfvec_clear (X[i]); for (i = 0; i < n + fwd; i++) pmfvec_clear (Y[i]); free (Y); free (X); } return success; } int testcase_pmfvec_tpifft_huge (unsigned lgK, unsigned lgM, unsigned lgT, ulong n, int fwd, ulong z, ulong t, const zn_mod_t mod) { pmfvec_t A, B; ulong M = 1UL << lgM; ulong K = 1UL << lgK; ulong i; ptrdiff_t skip = M + 1; pmfvec_init (A, lgK, skip, lgM, mod); pmfvec_init (B, lgK, skip, lgM, mod); // create random input for (i = 0; i < K; i++) pmf_rand (A->data + i * skip, M, mod); // make a copy pmfvec_set (B, A); // put random crap in B to check that it's ignored for (i = n + fwd; i < K; i++) pmf_rand (B->data + i * skip, M, mod); // run transposed IFFTs using huge and dc algorithms pmfvec_tpifft_dc (A, n, fwd, z, t); pmfvec_tpifft_huge (B, lgT, n, fwd, z, t); // compare results int success = 1; for (i = 0; i < z; i++) success = success && !pmf_cmp (B->data + i * skip, A->data + i * skip, M, mod); pmfvec_clear (B); pmfvec_clear (A); return success; } /* Tests pmfvec_tpifft_dc (if huge == 0) or pmfvec_tpifft_huge (if huge == 1) */ int test_pmfvec_tpifft_dc_or_huge (int huge, int quick) { int success = 1; int i; unsigned lgK, lgM, lgT; ulong z, n, t; int fwd; zn_mod_t mod; for (i = 0; i < num_test_bitsizes && success; i++) for (lgK = 0; lgK < 5 && success; lgK++) for (lgT = (huge ? 1 : 0); lgT < (huge ? lgK : 1) && success; lgT++) for (lgM = lgK ? (lgK - 1) : 0; lgM < lgK + (quick ? 1 : 3) && success; lgM++) { ulong K = 1UL << lgK; ulong M = 1UL << lgM; for (t = 0; t < ZNP_MIN (2 * M / K, quick ? 2 : 1000) && success; t++) for (z = 1; z <= K && success; z++) for (fwd = 0; fwd < 2 && success; fwd++) for (n = 1 - fwd; n <= K - fwd && n <= z && success; n++) { zn_mod_init (mod, random_modulus (test_bitsizes[i], 1)); success = success && (huge ? testcase_pmfvec_tpifft_huge (lgK, lgM, lgT, n, fwd, z, t, mod) : testcase_pmfvec_tpifft_dc (lgK, lgM, n, fwd, z, t, mod)); zn_mod_clear (mod); } } return success; } int test_pmfvec_tpifft_dc (int quick) { return test_pmfvec_tpifft_dc_or_huge (0, quick); } int test_pmfvec_tpifft_huge (int quick) { return test_pmfvec_tpifft_dc_or_huge (1, quick); } // end of file **************************************************************** zn_poly-0.9.2/test/ref_mul.c000066400000000000000000000155251360464557000160120ustar00rootroot00000000000000/* ref_mul.c: reference implementations for polynomial multiplication, middle product, scalar multiplication, integer middle product Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" #include "zn_poly_internal.h" #include /* Sets x = op[0] + op[1]*B + ... + op[n-1]*B^(n-1), where B = 2^b. Running time is soft-linear in output length. */ void pack (mpz_t x, const ulong* op, size_t n, unsigned b) { ZNP_ASSERT (n >= 1); if (n == 1) { // base case mpz_set_ui (x, op[0]); } else { // recursively split into top and bottom halves mpz_t y; mpz_init (y); pack (x, op, n / 2, b); pack (y, op + n / 2, n - n / 2, b); mpz_mul_2exp (y, y, (n / 2) * b); mpz_add (x, x, y); mpz_clear (y); } } /* Inverse operation of pack(), with output coefficients reduced mod m. Running time is soft-linear in output length. */ void unpack (ulong* res, const mpz_t op, size_t n, unsigned b, ulong m) { ZNP_ASSERT (n >= 1); ZNP_ASSERT (mpz_sizeinbase (op, 2) <= n * b); mpz_t y; mpz_init(y); if (n == 1) { // base case mpz_set (y, op); mpz_fdiv_r_ui (y, y, m); *res = mpz_get_ui (y); } else { // recursively split into top and bottom halves mpz_tdiv_q_2exp (y, op, (n / 2) * b); unpack (res + n / 2, y, n - n / 2, b, m); mpz_tdiv_r_2exp (y, op, (n / 2) * b); unpack (res, y, n / 2, b, m); } mpz_clear (y); } /* Reference implementation of zn_array_mul(). Very simple Kronecker substitution, uses GMP for multiplication. */ void ref_zn_array_mul (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); mpz_t x, y; mpz_init (x); mpz_init (y); unsigned b = 2 * mod->bits + ceil_lg (n2); unsigned words = CEIL_DIV (b, ULONG_BITS); pack (x, op1, n1, b); pack (y, op2, n2, b); mpz_mul (x, x, y); unpack (res, x, n1 + n2 - 1, b, mod->m); mpz_clear (y); mpz_clear (x); } /* Reference implementation of zn_array_mulmid(). Just calls ref_zn_array_mul() and extracts relevant part of output. */ void ref_zn_array_mulmid (ulong* res, const ulong* op1, size_t n1, const ulong* op2, size_t n2, const zn_mod_t mod) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); ulong* temp = (ulong*) malloc ((n1 + n2 - 1) * sizeof (ulong)); ref_zn_array_mul (temp, op1, n1, op2, n2, mod); ulong i; for (i = 0; i < n1 - n2 + 1; i++) res[i] = temp[i + n2 - 1]; free (temp); } /* Reference implementation of negacyclic multiplication. Multiplies op1[0, n) by op2[0, n) negacyclically, puts result into res[0, n). */ void ref_zn_array_negamul (ulong* res, const ulong* op1, const ulong* op2, size_t n, const zn_mod_t mod) { ulong* temp = (ulong*) malloc (sizeof (ulong) * 2 * n); ref_zn_array_mul (temp, op1, n, op2, n, mod); temp[2 * n - 1] = 0; mpz_t x; mpz_init (x); size_t i; for (i = 0; i < n; i++) { mpz_set_ui (x, temp[i]); mpz_sub_ui (x, x, temp[i + n]); mpz_mod_ui (x, x, mod->m); res[i] = mpz_get_ui (x); } mpz_clear (x); free (temp); } /* Reference implementation of scalar multiplication. Multiplies op[0, n) by x, puts result in res[0, n). */ void ref_zn_array_scalar_mul (ulong* res, const ulong* op, size_t n, ulong x, const zn_mod_t mod) { mpz_t y; mpz_init (y); size_t i; for (i = 0; i < n; i++) { mpz_set_ui (y, op[i]); mpz_mul_ui (y, y, x); mpz_mod_ui (y, y, mod->m); res[i] = mpz_get_ui (y); } mpz_clear (y); } /* Reference implementation of mpn_smp. Computes SMP(op1[0, n1), op2[0, n2)), stores result at res[0, n1 - n2 + 3). */ void ref_mpn_smp (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); mp_limb_t* prod = (mp_limb_t*) malloc (sizeof (mp_limb_t) * (n1 + n2)); // first compute the ordinary product mpn_mul (prod, op1, n1, op2, n2); // now we want to remove the cross-terms that could possibly interfere // with the result we want, i.e. in the following diagram, we want only // contributions from O's, but mpn_mul has given us all of O, A and X, // and we will remove the A's. // OOOOAAXX // AOOOOAAX // AAOOOOAA // XAAOOOOA // XXAAOOOO int which; // 0 == bottom-left corner, 1 == top-right corner size_t diag; // 0 == closest to diagonal, 1 == next diagonal size_t i, x, y, off; mp_limb_t lo, hi; for (which = 0; which <= 1; which++) for (diag = 0; diag < ZNP_MIN (n2 - 1, 2); diag++) for (i = 0; i < n2 - 1 - diag; i++) { x = n2 - 2 - i - diag; y = i; if (which) { x = n1 - 1 - x; y = n2 - 1 - y; } off = x + y; hi = mpn_mul_1 (&lo, op1 + x, 1, op2[y]); mpn_sub_1 (prod + off, prod + off, n1 + n2 - off, lo); mpn_sub_1 (prod + off + 1, prod + off + 1, n1 + n2 - off - 1, hi); } // copy the result to the output array memcpy (res, prod + n2 - 1, sizeof (mp_limb_t) * (n1 - n2 + 2)); res[n1 - n2 + 2] = (n2 > 1) ? prod[n1 + 1] : 0; free (prod); } /* Reference implementation of mpn_mulmid. Let P = product op1 * op2. Computes P[n2 + 1, n1), stores result at res[2, n1 - n2 + 1). */ void ref_mpn_mulmid (mp_limb_t* res, const mp_limb_t* op1, size_t n1, const mp_limb_t* op2, size_t n2) { ZNP_ASSERT (n2 >= 1); ZNP_ASSERT (n1 >= n2); mp_limb_t* prod = (mp_limb_t*) malloc (sizeof (mp_limb_t) * (n1 + n2)); // compute the ordinary product mpn_mul (prod, op1, n1, op2, n2); // copy relevant segment to output if (n1 > n2) memcpy (res + 2, prod + n2 + 1, sizeof (mp_limb_t) * (n1 - n2 - 1)); free (prod); } // end of file **************************************************************** zn_poly-0.9.2/test/support.c000066400000000000000000000050101360464557000160610ustar00rootroot00000000000000/* support.c: various support routines for test, profiling and tuning code Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "support.h" unsigned test_bitsizes[] = {2, 3, 8, ULONG_BITS/2 - 1, ULONG_BITS/2, ULONG_BITS/2 + 1, ULONG_BITS - 2, ULONG_BITS - 1, ULONG_BITS}; unsigned num_test_bitsizes = sizeof (test_bitsizes) / sizeof (unsigned); gmp_randstate_t randstate; void mpz_to_mpn (mp_limb_t* res, size_t n, const mpz_t op) { ZNP_ASSERT (mpz_size (op) <= n); size_t count_p; mpz_export (res, &count_p, -1, sizeof (mp_limb_t), 0, GMP_NAIL_BITS, op); // zero-pad remaining buffer for (; count_p < n; count_p++) res[count_p] = 0; } void mpn_to_mpz (mpz_t res, const mp_limb_t* op, size_t n) { ZNP_ASSERT (n >= 1); mpz_import (res, n, -1, sizeof (mp_limb_t), 0, GMP_NAIL_BITS, op); } ulong random_ulong (ulong max) { return gmp_urandomm_ui (randstate, max); } ulong random_ulong_bits (unsigned b) { return gmp_urandomb_ui (randstate, b); } ulong random_modulus (unsigned b, int require_odd) { ZNP_ASSERT(b >= 2 && b <= ULONG_BITS); if (require_odd) { if (b == 2) return 3; return (1UL << (b - 1)) + 2 * random_ulong_bits (b - 2) + 1; } else return (1UL << (b - 1)) + random_ulong_bits (b - 1); } void zn_array_print (const ulong* x, size_t n) { size_t i; printf ("["); for (i = 0; i < n; i++) { if (i) printf (", "); printf ("%lu", x[i]); } printf ("]"); } void ZNP_mpn_random2 (mp_limb_t* res, size_t n) { size_t i; mpn_random2 (res, n); if (random_ulong (2)) for (i = 0; i < n; i++) res[i] ^= GMP_NUMB_MASK; } // end of file **************************************************************** zn_poly-0.9.2/test/test.c000066400000000000000000000141351360464557000153340ustar00rootroot00000000000000/* test.c: command line program for running test routines Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "zn_poly_internal.h" typedef struct { // function name to print char* name; // pointer to actual test function; should return 1 if test passes // parameter quick is 1 if only a quick test is wanted int (*func)(int quick); } test_target_t; extern int test_zn_array_mul_KS1 (int quick); extern int test_zn_array_mul_KS2 (int quick); extern int test_zn_array_mul_KS3 (int quick); extern int test_zn_array_mul_KS4 (int quick); extern int test_zn_array_sqr_KS1 (int quick); extern int test_zn_array_sqr_KS2 (int quick); extern int test_zn_array_sqr_KS3 (int quick); extern int test_zn_array_sqr_KS4 (int quick); extern int test_zn_array_mulmid_KS1 (int quick); extern int test_zn_array_mulmid_KS2 (int quick); extern int test_zn_array_mulmid_KS3 (int quick); extern int test_zn_array_mulmid_KS4 (int quick); extern int test_zn_array_recover_reduce (int quick); extern int test_zn_array_pack (int quick); extern int test_zn_array_unpack (int quick); extern int test_zn_array_mul_fft (int quick); extern int test_zn_array_sqr_fft (int quick); extern int test_zn_array_mul_fft_dft (int quick); extern int test_zn_array_mulmid_fft (int quick); extern int test_nuss_mul (int quick); extern int test_pmfvec_fft_dc (int quick); extern int test_pmfvec_fft_huge (int quick); extern int test_pmfvec_ifft_dc (int quick); extern int test_pmfvec_ifft_huge (int quick); extern int test_pmfvec_tpfft_dc (int quick); extern int test_pmfvec_tpfft_huge (int quick); extern int test_pmfvec_tpifft_dc (int quick); extern int test_pmfvec_tpifft_huge (int quick); extern int test_zn_array_invert (int quick); extern int test_mpn_smp_basecase (int quick); extern int test_mpn_smp_kara (int quick); extern int test_mpn_smp (int quick); extern int test_mpn_mulmid (int quick); test_target_t targets[] = { {"mpn_smp_basecase", test_mpn_smp_basecase}, {"mpn_smp_kara", test_mpn_smp_kara}, {"mpn_smp", test_mpn_smp}, {"mpn_mulmid", test_mpn_mulmid}, {"zn_array_recover_reduce", test_zn_array_recover_reduce}, {"zn_array_pack", test_zn_array_pack}, {"zn_array_unpack", test_zn_array_unpack}, {"zn_array_mul_KS1", test_zn_array_mul_KS1}, {"zn_array_mul_KS2", test_zn_array_mul_KS2}, {"zn_array_mul_KS3", test_zn_array_mul_KS3}, {"zn_array_mul_KS4", test_zn_array_mul_KS4}, {"zn_array_sqr_KS1", test_zn_array_sqr_KS1}, {"zn_array_sqr_KS2", test_zn_array_sqr_KS2}, {"zn_array_sqr_KS3", test_zn_array_sqr_KS3}, {"zn_array_sqr_KS4", test_zn_array_sqr_KS4}, {"zn_array_mulmid_KS1", test_zn_array_mulmid_KS1}, {"zn_array_mulmid_KS2", test_zn_array_mulmid_KS2}, {"zn_array_mulmid_KS3", test_zn_array_mulmid_KS3}, {"zn_array_mulmid_KS4", test_zn_array_mulmid_KS4}, {"nuss_mul", test_nuss_mul}, {"pmfvec_fft_dc", test_pmfvec_fft_dc}, {"pmfvec_fft_huge", test_pmfvec_fft_huge}, {"pmfvec_ifft_dc", test_pmfvec_ifft_dc}, {"pmfvec_ifft_huge", test_pmfvec_ifft_huge}, {"pmfvec_tpfft_dc", test_pmfvec_tpfft_dc}, {"pmfvec_tpfft_huge", test_pmfvec_tpfft_huge}, {"pmfvec_tpifft_dc", test_pmfvec_tpifft_dc}, {"pmfvec_tpifft_huge", test_pmfvec_tpifft_huge}, {"zn_array_mul_fft", test_zn_array_mul_fft}, {"zn_array_sqr_fft", test_zn_array_sqr_fft}, {"zn_array_mulmid_fft", test_zn_array_mulmid_fft}, {"zn_array_mul_fft_dft", test_zn_array_mul_fft_dft}, {"zn_array_invert", test_zn_array_invert}, }; const unsigned num_targets = sizeof (targets) / sizeof (targets[0]); int run_test (test_target_t* target, int quick) { printf ("%s()... ", target->name); fflush (stdout); int success = target->func (quick); printf ("%s\n", success ? "ok" : "FAIL!"); return success; } void usage () { printf ("usage: test [ -quick ] ...\n\n"); printf ("Available targets:\n\n"); printf (" all (runs all tests)\n"); int i; for (i = 0; i < num_targets; i++) printf (" %s\n", targets[i].name); } int main (int argc, char* argv[]) { gmp_randinit_default (randstate); int all_success = 1, any_targets = 0, quick = 0, success, i, j; for (j = 1; j < argc; j++) { if (!strcmp (argv[j], "-quick")) quick = 1; else if (!strcmp (argv[j], "all")) { any_targets = 1; for (i = 0; i < num_targets; i++) all_success = all_success && run_test (&targets[i], quick); } else { int found = -1; for (i = 0; i < num_targets; i++) if (!strcmp (argv[j], targets[i].name)) found = i; if (found == -1) { printf ("unknown target string \"%s\"\n", argv[j]); return 0; } any_targets = 1; all_success = all_success && run_test (&targets[found], quick); } } if (!any_targets) { usage (); return 0; } printf("\n"); if (all_success) printf ("All tests passed.\n"); else printf ("At least one test FAILED!\n"); gmp_randclear (randstate); return !all_success; } // end of file **************************************************************** zn_poly-0.9.2/tune/000077500000000000000000000000001360464557000142015ustar00rootroot00000000000000zn_poly-0.9.2/tune/mpn_mulmid-tune.c000066400000000000000000000131601360464557000174600ustar00rootroot00000000000000/* mpn_mulmid-tune.c: tuning program for integer middle product algorithms Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "zn_poly_internal.h" #include "profiler.h" /* Finds approximate threshold between basecase and karatsuba integer middle product algorithms. Store this in mpn_smp_kara_thresh, and writes some logging information to flog. */ void tune_mpn_smp_kara (FILE* flog, int verbose) { fprintf (flog, "mpn smp kara: "); fflush (flog); // how long we are willing to wait for each profile run const double speed = 0.0001; // reset the threshold so karatsuba never accidentally recurses ZNP_mpn_smp_kara_thresh = SIZE_MAX; size_t thresh; const int max_intervals = 200; size_t points[max_intervals + 1]; double score[max_intervals + 1]; double result[2]; profile_info_t info; // find an upper bound, where karatsuba appears to be safely ahead of // basecase size_t upper; int found = 0; for (upper = 2; upper <= 1000 && !found; upper = 2 * upper) { info->n1 = 2 * upper - 1; info->n2 = info->n = upper; result[0] = profile (NULL, NULL, profile_mpn_smp_basecase, info, speed); result[1] = profile (NULL, NULL, profile_mpn_smp_kara, info, speed); if (result[1] < 0.9 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable upper bound thresh = SIZE_MAX; goto done; } // subdivide [2, upper] into intervals and sample at each endpoint double lower = 2.0; double ratio = (double) upper / lower; unsigned i; for (i = 0; i <= max_intervals; i++) { points[i] = ceil (lower * pow (ratio, (double) i / max_intervals)); info->n1 = 2 * points[i] - 1; info->n2 = info->n = points[i]; result[0] = profile (NULL, NULL, profile_mpn_smp_basecase, info, speed); result[1] = profile (NULL, NULL, profile_mpn_smp_kara, info, speed); score[i] = result[1] / result[0]; } // estimate threshold unsigned count = 0; for (i = 0; i <= max_intervals; i++) if (score[i] > 1.0) count++; thresh = (size_t) ceil (lower * pow (ratio, (double) count / (max_intervals + 1))); done: if (verbose) { if (thresh == SIZE_MAX) fprintf (flog, "infinity"); else fprintf (flog, "%lu", thresh); } else fprintf (flog, "done"); fflush (flog); ZNP_mpn_smp_kara_thresh = thresh; fprintf (flog, "\n"); } /* Finds approximate threshold between karatsuba and fallback integer middle product algorithms. Store this in mpn_mulmid_fallback_thresh, and writes some logging information to flog. */ void tune_mpn_mulmid_fallback (FILE* flog, int verbose) { fprintf (flog, "mpn mulmid fallback: "); fflush (flog); // how long we are willing to wait for each profile run const double speed = 0.0001; size_t thresh; const int max_intervals = 30; size_t points[max_intervals + 1]; double score[max_intervals + 1]; double result[2]; profile_info_t info; // find an upper bound, where fallback appears to be safely ahead of // karatsuba size_t upper = ZNP_mpn_smp_kara_thresh; int found = 0; for (; upper <= 40000 && !found; upper = 2 * upper) { info->n1 = 2 * upper - 1; info->n2 = info->n = upper; result[0] = profile (NULL, NULL, profile_mpn_smp_kara, info, speed); result[1] = profile (NULL, NULL, profile_mpn_mulmid_fallback, info, speed); if (result[1] < 0.9 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable upper bound thresh = SIZE_MAX; goto done; } // subdivide [kara_thresh, upper] into intervals and sample at // each endpoint double lower = ZNP_mpn_smp_kara_thresh; double ratio = (double) upper / lower; unsigned i; for (i = 0; i <= max_intervals; i++) { points[i] = ceil (lower * pow (ratio, (double) i / max_intervals)); info->n1 = 2 * points[i] - 1; info->n2 = info->n = points[i]; result[0] = profile (NULL, NULL, profile_mpn_smp_kara, info, speed); result[1] = profile (NULL, NULL, profile_mpn_mulmid_fallback, info, speed); score[i] = result[1] / result[0]; } // estimate threshold unsigned count = 0; for (i = 0; i <= max_intervals; i++) if (score[i] > 1.0) count++; thresh = (size_t) ceil (lower * pow (ratio, (double) count / (max_intervals + 1))); done: if (verbose) { if (thresh == SIZE_MAX) fprintf (flog, "infinity"); else fprintf (flog, "%lu", thresh); } else fprintf (flog, "done"); fflush (flog); ZNP_mpn_mulmid_fallback_thresh = thresh; fprintf (flog, "\n"); } // end of file **************************************************************** zn_poly-0.9.2/tune/mul-tune.c000066400000000000000000000112041360464557000161110ustar00rootroot00000000000000/* mul-tune.c: tuning program for multiplication algorithms Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "zn_poly_internal.h" #include "profiler.h" /* For each modulus size, finds approximate threshold between KS4 and fft multiplication algorithms. (Note this needs to be done *after* the KS1/KS2/KS4 and nussbaumer thresholds have been determined, since they are used as subroutines.) Store these in the global threshold table, and writes some logging information to flog. */ void tune_mul (FILE* flog, int sqr, int verbose) { unsigned b; fprintf (flog, " KS/FFT %s: ", sqr ? "sqr" : "mul"); fflush (flog); // how long we are willing to wait for each profile run const double speed = 0.0001; // run tuning process for each modulus size for (b = 2; b <= ULONG_BITS; b++) { // thresh for KS4->FFT size_t thresh; const int max_intervals = 20; size_t points[max_intervals + 1]; double score[max_intervals + 1]; double result[2]; profile_info_t info[2]; info[0]->sqr = info[1]->sqr = sqr; info[0]->m = info[1]->m = (1UL << (b - 1)) + 1; info[0]->algo = ALGO_MUL_KS4; info[1]->algo = ALGO_MUL_FFT; // find an upper bound, where FFT algorithm appears to be safely // ahead of KS4 algorithm size_t upper; int found = 0; for (upper = 45; upper <= 65536 && !found; upper = 2 * upper) { info[0]->n1 = info[1]->n1 = upper; info[0]->n2 = info[1]->n2 = upper; result[0] = profile (NULL, NULL, profile_mul, info[0], speed); result[1] = profile (NULL, NULL, profile_mul, info[1], speed); if (result[1] < 0.95 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable upper bound thresh = SIZE_MAX; goto done; } // find a lower bound, where KS4 algorithm appears to be safely // ahead of FFT algorithm size_t lower; found = 0; for (lower = upper/2; lower >= 2 && !found; lower = lower / 2) { info[0]->n1 = info[1]->n1 = lower; info[0]->n2 = info[1]->n2 = lower; result[0] = profile (NULL, NULL, profile_mul, info[0], speed); result[1] = profile (NULL, NULL, profile_mul, info[1], speed); if (result[1] > 1.05 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable lower bound thresh = 0; goto done; } // subdivide [lower, upper] into intervals and sample at each endpoint double ratio = (double) upper / (double) lower; unsigned i; for (i = 0; i <= max_intervals; i++) { points[i] = ceil (lower * pow (ratio, (double) i / max_intervals)); info[0]->n1 = info[1]->n1 = points[i]; info[0]->n2 = info[1]->n2 = points[i]; result[0] = profile (NULL, NULL, profile_mul, info[0], speed); result[1] = profile (NULL, NULL, profile_mul, info[1], speed); score[i] = result[1] / result[0]; } // estimate threshold unsigned count = 0; for (i = 0; i <= max_intervals; i++) if (score[i] > 1.0) count++; thresh = (size_t) ceil (lower * pow (ratio, (double) count / (max_intervals + 1))); done: if (verbose) { fprintf (flog, "\nbits = %u, cross to FFT at ", b); if (thresh == SIZE_MAX) fprintf (flog, "infinity"); else fprintf (flog, "%lu", thresh); } else fprintf (flog, "."); fflush (flog); if (sqr) tuning_info[b].sqr_fft_thresh = thresh; else tuning_info[b].mul_fft_thresh = thresh; } fprintf (flog, "\n"); } // end of file **************************************************************** zn_poly-0.9.2/tune/mul_ks-tune.c000066400000000000000000000125311360464557000166120ustar00rootroot00000000000000/* mul_ks-tune.c: tuning program for mul_ks.c module Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include #include "support.h" #include "zn_poly_internal.h" #include "profiler.h" /* For each modulus bitsize, finds approximate threshold between KS1, KS2, and KS4 multiplication (or squaring, if that flag is set) algorithms. Store these in the global tuning table, and writes some logging information to _flog_. */ void tune_mul_KS (FILE* flog, int sqr, int verbose) { unsigned b; fprintf (flog, " KS1/2/4 %s: ", sqr ? "sqr" : "mul"); fflush (flog); // how long we are willing to wait for each profile run const double speed = 0.0001; // run tuning process for each modulus size for (b = 2; b <= ULONG_BITS; b++) { // thresholds for KS1->KS2 and KS2->KS4 unsigned long thresh[2]; // which = 0 means comparing KS1 vs KS2 // which = 1 means comparing KS2 vs KS4 unsigned which; for (which = 0; which < 2; which++) { double result[2]; profile_info_t info[2]; info[0]->sqr = info[1]->sqr = sqr; info[0]->m = info[1]->m = (1UL << (b - 1)) + 1; info[0]->algo = which ? ALGO_MUL_KS2_REDC : ALGO_MUL_KS1_REDC; info[1]->algo = which ? ALGO_MUL_KS4_REDC : ALGO_MUL_KS2_REDC; // find an upper bound, where 2nd algorithm appears to be safely // ahead of 1st algorithm size_t upper; int found = 0; for (upper = 45; upper <= 16384 && !found; upper = 2 * upper) { info[0]->n1 = info[1]->n1 = upper; info[0]->n2 = info[1]->n2 = upper; result[0] = profile (NULL, NULL, profile_mul, info[0], speed); result[1] = profile (NULL, NULL, profile_mul, info[1], speed); if (result[1] < 0.95 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable upper bound thresh[which] = SIZE_MAX; continue; } // find a lower bound, where 1st algorithm appears to be safely // ahead of 2nd algorithm size_t lower; found = 0; for (lower = upper / 2; lower >= 2 && !found; lower = lower / 2) { info[0]->n1 = info[1]->n1 = lower; info[0]->n2 = info[1]->n2 = lower; result[0] = profile (NULL, NULL, profile_mul, info[0], speed); result[1] = profile (NULL, NULL, profile_mul, info[1], speed); if (result[1] > 1.05 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable lower bound thresh[which] = 0; continue; } // subdivide [lower, upper] into intervals and sample at each endpoint double ratio = (double) upper / (double) lower; const int max_intervals = 30; size_t points[max_intervals + 1]; double score[max_intervals + 1]; unsigned i; for (i = 0; i <= max_intervals; i++) { points[i] = ceil (lower * pow (ratio, (double) i / max_intervals)); info[0]->n1 = info[1]->n1 = points[i]; info[0]->n2 = info[1]->n2 = points[i]; result[0] = profile (NULL, NULL, profile_mul, info[0], speed); result[1] = profile (NULL, NULL, profile_mul, info[1], speed); score[i] = result[1] / result[0]; } // estimate threshold unsigned count = 0; for (i = 0; i <= max_intervals; i++) if (score[i] > 1.0) count++; thresh[which] = (size_t) ceil (lower * pow (ratio, (double) count / (max_intervals + 1))); } if (verbose) { fprintf (flog, "\nbits = %u, cross to KS2 at ", b); if (thresh[0] == SIZE_MAX) fprintf (flog, "infinity"); else fprintf (flog, "%lu", thresh[0]); fprintf (flog, ", cross to KS4 at "); if (thresh[1] == SIZE_MAX) fprintf (flog, "infinity"); else fprintf (flog, "%lu", thresh[1]); } else fprintf (flog, "."); fflush (flog); if (sqr) { tuning_info[b].sqr_KS2_thresh = thresh[0]; tuning_info[b].sqr_KS4_thresh = thresh[1]; } else { tuning_info[b].mul_KS2_thresh = thresh[0]; tuning_info[b].mul_KS4_thresh = thresh[1]; } } fprintf (flog, "\n"); } // end of file **************************************************************** zn_poly-0.9.2/tune/mulmid-tune.c000066400000000000000000000110571360464557000166110ustar00rootroot00000000000000/* mulmid-tune.c: tuning program for middle product algorithms Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "zn_poly_internal.h" #include "profiler.h" /* For each modulus size, finds approximate threshold between KS4 and fft middle product algorithms. (Note this needs to be done *after* the KS1/KS2/KS4 multiplication and nussbaumer thresholds have been determined, since they are used as subroutines.) Store these in the global threshold table, and writes some logging information to flog. */ void tune_mulmid (FILE* flog, int verbose) { unsigned b; fprintf (flog, " KS/FFT mulmid: "); fflush (flog); // how long we are willing to wait for each profile run const double speed = 0.0001; // run tuning process for each modulus size for (b = 2; b <= ULONG_BITS; b++) { // thresh for KS4->FFT size_t thresh; const int max_intervals = 20; size_t points[max_intervals + 1]; double score[max_intervals + 1]; double result[2]; profile_info_t info[2]; info[0]->m = info[1]->m = (1UL << (b - 1)) + 1; info[0]->algo = ALGO_MULMID_KS4; info[1]->algo = ALGO_MULMID_FFT; // find an upper bound, where FFT algorithm appears to be safely // ahead of KS4 algorithm size_t upper; int found = 0; for (upper = 45; upper <= 65536 && !found; upper = 2 * upper) { info[0]->n1 = info[1]->n1 = 2 * upper; info[0]->n2 = info[1]->n2 = upper; result[0] = profile (NULL, NULL, profile_mulmid, info[0], speed); result[1] = profile (NULL, NULL, profile_mulmid, info[1], speed); if (result[1] < 0.95 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable upper bound thresh = SIZE_MAX; goto done; } // find a lower bound, where KS4 algorithm appears to be safely // ahead of FFT algorithm size_t lower; found = 0; for (lower = upper/2; lower >= 2 && !found; lower = lower / 2) { info[0]->n1 = info[1]->n1 = 2 * lower; info[0]->n2 = info[1]->n2 = lower; result[0] = profile (NULL, NULL, profile_mulmid, info[0], speed); result[1] = profile (NULL, NULL, profile_mulmid, info[1], speed); if (result[1] > 1.05 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable lower bound thresh = 0; goto done; } // subdivide [lower, upper] into intervals and sample at each endpoint double ratio = (double) upper / (double) lower; unsigned i; for (i = 0; i <= max_intervals; i++) { points[i] = ceil (lower * pow (ratio, (double) i / max_intervals)); info[0]->n1 = info[1]->n1 = 2 * points[i]; info[0]->n2 = info[1]->n2 = points[i]; result[0] = profile (NULL, NULL, profile_mulmid, info[0], speed); result[1] = profile (NULL, NULL, profile_mulmid, info[1], speed); score[i] = result[1] / result[0]; } // estimate threshold unsigned count = 0; for (i = 0; i <= max_intervals; i++) if (score[i] > 1.0) count++; thresh = (size_t) ceil (lower * pow (ratio, (double) count / (max_intervals + 1))); done: if (verbose) { fprintf (flog, "\nbits = %u, cross to FFT at ", b); if (thresh == SIZE_MAX) fprintf (flog, "infinity"); else fprintf (flog, "%lu", thresh); } else fprintf (flog, "."); fflush (flog); tuning_info[b].mulmid_fft_thresh = thresh; } fprintf (flog, "\n"); } // end of file **************************************************************** zn_poly-0.9.2/tune/mulmid_ks-tune.c000066400000000000000000000121761360464557000173110ustar00rootroot00000000000000/* mulmid_ks-tune.c: tuning program for mulmid_ks.c module Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include #include "support.h" #include "zn_poly_internal.h" #include "profiler.h" /* For each modulus bitsize, finds approximate threshold between KS1, KS2, and KS4 middle product algorithms. Store these in the global tuning table, and writes some logging information to _flog_. */ void tune_mulmid_KS (FILE* flog, int verbose) { unsigned b; fprintf (flog, "KS1/2/4 mulmid: "); fflush (flog); // how long we are willing to wait for each profile run const double speed = 0.0001; // run tuning process for each modulus size for (b = 2; b <= ULONG_BITS; b++) { // thresholds for KS1->KS2 and KS2->KS4 unsigned long thresh[2]; // which = 0 means comparing KS1 vs KS2 // which = 1 means comparing KS2 vs KS4 unsigned which; for (which = 0; which < 2; which++) { double result[2]; profile_info_t info[2]; info[0]->m = info[1]->m = (1UL << (b - 1)) + 1; info[0]->algo = which ? ALGO_MULMID_KS2_REDC : ALGO_MULMID_KS1_REDC; info[1]->algo = which ? ALGO_MULMID_KS4_REDC : ALGO_MULMID_KS2_REDC; // find an upper bound, where 2nd algorithm appears to be safely // ahead of 1st algorithm size_t upper; int found = 0; for (upper = 45; upper <= 16384 && !found; upper = 2 * upper) { info[0]->n1 = info[1]->n1 = 2 * upper; info[0]->n2 = info[1]->n2 = upper; result[0] = profile (NULL, NULL, profile_mulmid, info[0], speed); result[1] = profile (NULL, NULL, profile_mulmid, info[1], speed); if (result[1] < 0.95 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable upper bound thresh[which] = SIZE_MAX; continue; } // find a lower bound, where 1st algorithm appears to be safely // ahead of 2nd algorithm size_t lower; found = 0; for (lower = upper / 2; lower >= 2 && !found; lower = lower / 2) { info[0]->n1 = info[1]->n1 = 2 * lower; info[0]->n2 = info[1]->n2 = lower; result[0] = profile (NULL, NULL, profile_mulmid, info[0], speed); result[1] = profile (NULL, NULL, profile_mulmid, info[1], speed); if (result[1] > 1.05 * result[0]) found = 1; } if (!found) { // couldn't find a reasonable lower bound thresh[which] = 0; continue; } // subdivide [lower, upper] into intervals and sample at each endpoint double ratio = (double) upper / (double) lower; const int max_intervals = 30; size_t points[max_intervals + 1]; double score[max_intervals + 1]; unsigned i; for (i = 0; i <= max_intervals; i++) { points[i] = ceil (lower * pow (ratio, (double) i / max_intervals)); info[0]->n1 = info[1]->n1 = 2 * points[i]; info[0]->n2 = info[1]->n2 = points[i]; result[0] = profile (NULL, NULL, profile_mulmid, info[0], speed); result[1] = profile (NULL, NULL, profile_mulmid, info[1], speed); score[i] = result[1] / result[0]; } // estimate threshold unsigned count = 0; for (i = 0; i <= max_intervals; i++) if (score[i] > 1.0) count++; thresh[which] = (size_t) ceil (lower * pow (ratio, (double) count / (max_intervals + 1))); } if (verbose) { fprintf (flog, "\nbits = %u, cross to KS2 at ", b); if (thresh[0] == SIZE_MAX) fprintf (flog, "infinity"); else fprintf (flog, "%lu", thresh[0]); fprintf (flog, ", cross to KS4 at "); if (thresh[1] == SIZE_MAX) fprintf (flog, "infinity"); else fprintf (flog, "%lu", thresh[1]); } else fprintf (flog, "."); fflush (flog); tuning_info[b].mulmid_KS2_thresh = thresh[0]; tuning_info[b].mulmid_KS4_thresh = thresh[1]; } fprintf (flog, "\n"); } // end of file **************************************************************** zn_poly-0.9.2/tune/nuss-tune.c000066400000000000000000000062461360464557000163160ustar00rootroot00000000000000/* nuss-tune.c: tuning program for negacyclic nussbaumer multiplication Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "zn_poly_internal.h" #include "profiler.h" // the maximum threshold we allow for switching from KS to nussbaumer #define MAX_NUSS_THRESH 14 /* For each modulus size, finds threshold between KS and nussbaumer multiplication (or squaring, if the squaring flag is set). Store these in the global threshold table, and writes some logging information to flog. */ void tune_nuss (FILE* flog, int sqr, int verbose) { unsigned b; fprintf (flog, " nuss %s: ", sqr ? "sqr" : "mul"); fflush (flog); // how long we are willing to wait for each profile run const double speed = 0.001; // run tuning process for each modulus size for (b = 2; b <= ULONG_BITS; b++) { unsigned thresh; profile_info_t info[2]; info[0]->sqr = info[1]->sqr = sqr; info[0]->m = info[1]->m = (1UL << (b - 1)) + 1; info[0]->algo = ALGO_NEGAMUL_FALLBACK; info[1]->algo = ALGO_NEGAMUL_NUSS; for (thresh = 3; thresh < MAX_NUSS_THRESH; thresh++) { double result[2]; info[0]->lgL = info[1]->lgL = thresh; // need nussbaumer to win three times unsigned trial; for (trial = 0; trial < 3; trial++) { result[0] = profile (NULL, NULL, profile_negamul, info[0], speed); result[1] = profile (NULL, NULL, profile_negamul, info[1], speed); if (result[0] < result[1]) break; } if (trial == 3) break; // found threshold } // If it looks like KS is always going to win, just always use KS // (for instance this might happen if GMP's huge integer multiplication // improves stupendously) if (thresh == MAX_NUSS_THRESH) thresh = 1000; if (sqr) tuning_info[b].nuss_sqr_thresh = thresh; else tuning_info[b].nuss_mul_thresh = thresh; if (verbose) { fprintf (flog, "\nbits = %u, cross to Nussbaumer at length ", b); if (thresh == 1000) fprintf (flog, "infinity"); else fprintf (flog, "%lu", 1UL << thresh); } else fprintf (flog, "."); fflush (flog); } fprintf (flog, "\n"); } // end of file **************************************************************** zn_poly-0.9.2/tune/tune.c000066400000000000000000000132071360464557000153230ustar00rootroot00000000000000/* tune_main.c: main() routine for tuning program Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include #include "support.h" #include "zn_poly_internal.h" #include "profiler.h" char* header = "/*\n" " NOTE: do not edit this file! It is auto-generated by the \"tune\" program.\n" " (Run \"make tune\" and then \"./tune > tuning.c\" to regenerate it.)\n" "*/\n" "\n" "/*\n" " tuning.c: global tuning values\n" "\n" " Copyright (C) 2007, 2008, David Harvey\n" "\n" " This file is part of the zn_poly library (version 0.9).\n" "\n" " This program is free software: you can redistribute it and/or modify\n" " it under the terms of the GNU General Public License as published by\n" " the Free Software Foundation, either version 2 of the License, or\n" " (at your option) version 3 of the License.\n" "\n" " This program is distributed in the hope that it will be useful,\n" " but WITHOUT ANY WARRANTY; without even the implied warranty of\n" " MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n" " GNU General Public License for more details.\n" "\n" " You should have received a copy of the GNU General Public License\n" " along with this program. If not, see .\n" "\n" "*/\n" "\n" "#include \"zn_poly_internal.h\"\n" "\n"; char* footer = "};\n" "\n" "// end of file ****************************************************************\n"; int main (int argc, char* argv[]) { fprintf (stderr, "zn_poly tuning program\n"); fprintf (stderr, "(use -v flag for verbose output)\n\n"); gmp_randinit_default (randstate); #if ZNP_HAVE_CYCLE_COUNTER calibrate_cycle_scale_factor (); int verbose = 0; if (argc == 2 && !strcmp (argv[1], "-v")) verbose = 1; // call various routines to generate tuning data tune_mpn_smp_kara (stderr, verbose); tune_mpn_mulmid_fallback (stderr, verbose); tune_mul_KS (stderr, 0, verbose); tune_mul_KS (stderr, 1, verbose); tune_mulmid_KS (stderr, verbose); tune_nuss (stderr, 0, verbose); tune_nuss (stderr, 1, verbose); tune_mul (stderr, 0, verbose); tune_mul (stderr, 1, verbose); tune_mulmid (stderr, verbose); #else fprintf (stderr, "\n" "Warning: no cycle counting code available on this system,\n" "using default tuning values.\n\n"); #endif unsigned bits; size_t x; // generate tuning.c file printf ("%s", header); x = ZNP_mpn_smp_kara_thresh; printf ("size_t ZNP_mpn_smp_kara_thresh = "); printf (x == SIZE_MAX ? "SIZE_MAX;\n" : "%lu;\n", x); x = ZNP_mpn_mulmid_fallback_thresh; printf ("size_t ZNP_mpn_mulmid_fallback_thresh = "); printf (x == SIZE_MAX ? "SIZE_MAX;\n" : "%lu;\n", x); printf ("\n"); printf ("tuning_info_t tuning_info[] = \n"); printf ("{\n"); printf (" { // bits = 0\n"); printf (" },\n"); printf (" { // bits = 1\n"); printf (" },\n"); for (bits = 2; bits <= ULONG_BITS; bits++) { printf (" { // bits = %u\n", bits); x = tuning_info[bits].mul_KS2_thresh; printf (x == SIZE_MAX ? " SIZE_MAX," : " %5lu,", x); printf (" // KS1 -> KS2 multiplication threshold\n"); x = tuning_info[bits].mul_KS4_thresh; printf (x == SIZE_MAX ? " SIZE_MAX," : " %5lu,", x); printf (" // KS2 -> KS4 multiplication threshold\n"); x = tuning_info[bits].mul_fft_thresh; printf (x == SIZE_MAX ? " SIZE_MAX," : " %5lu,", x); printf (" // KS4 -> FFT multiplication threshold\n"); x = tuning_info[bits].sqr_KS2_thresh; printf (x == SIZE_MAX ? " SIZE_MAX," : " %5lu,", x); printf (" // KS1 -> KS2 squaring threshold\n"); x = tuning_info[bits].sqr_KS4_thresh; printf (x == SIZE_MAX ? " SIZE_MAX," : " %5lu,", x); printf (" // KS2 -> KS4 squaring threshold\n"); x = tuning_info[bits].sqr_fft_thresh; printf (x == SIZE_MAX ? " SIZE_MAX," : " %5lu,", x); printf (" // KS4 -> FFT squaring threshold\n"); x = tuning_info[bits].mulmid_KS2_thresh; printf (x == SIZE_MAX ? " SIZE_MAX," : " %5lu,", x); printf (" // KS1 -> KS2 middle product threshold\n"); x = tuning_info[bits].mulmid_KS4_thresh; printf (x == SIZE_MAX ? " SIZE_MAX," : " %5lu,", x); printf (" // KS2 -> KS4 middle product threshold\n"); x = tuning_info[bits].mulmid_fft_thresh; printf (x == SIZE_MAX ? " SIZE_MAX," : " %5lu,", x); printf (" // KS4 -> FFT middle product threshold\n"); printf (" %5lu, // nussbaumer multiplication threshold\n", tuning_info[bits].nuss_mul_thresh); printf (" %5lu // nussbaumer squaring threshold\n", tuning_info[bits].nuss_sqr_thresh); printf (" },\n"); } printf ("%s", footer); gmp_randclear (randstate); return 0; } // end of file **************************************************************** zn_poly-0.9.2/tune/tuning.c000066400000000000000000001144551360464557000156630ustar00rootroot00000000000000/* NOTE: This copy of tuning.c was bootstrapped from earlier versions of the program and just contains some initial values from David Harvey's own machine at the time the file was generated. It can be regenerated (as explained in the next comment) by building the "tune" program, and then generating tunings and replacing this file with the generated one. However, in practice this should not be necessary (the generated tunings are used, finally, for the zn_poly library itself). */ /* NOTE: do not edit this file! It is auto-generated by the "tune" program. (Run "make tune" and then "./tune > tuning.c" to regenerate it.) */ /* tuning.c: global tuning values Copyright (C) 2007, 2008, David Harvey This file is part of the zn_poly library (version 0.9). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include "zn_poly_internal.h" size_t ZNP_mpn_smp_kara_thresh = 42; size_t ZNP_mpn_mulmid_fallback_thresh = 7863; tuning_info_t tuning_info[] = { { // bits = 0 }, { // bits = 1 }, { // bits = 2 132, // KS1 -> KS2 multiplication threshold 1127, // KS2 -> KS4 multiplication threshold SIZE_MAX, // KS4 -> FFT multiplication threshold 141, // KS1 -> KS2 squaring threshold 1053, // KS2 -> KS4 squaring threshold SIZE_MAX, // KS4 -> FFT squaring threshold 132, // KS1 -> KS2 middle product threshold 1053, // KS2 -> KS4 middle product threshold 21569, // KS4 -> FFT middle product threshold 13, // nussbaumer multiplication threshold 12 // nussbaumer squaring threshold }, { // bits = 3 132, // KS1 -> KS2 multiplication threshold 2755, // KS2 -> KS4 multiplication threshold SIZE_MAX, // KS4 -> FFT multiplication threshold 132, // KS1 -> KS2 squaring threshold 4119, // KS2 -> KS4 squaring threshold 24613, // KS4 -> FFT squaring threshold 132, // KS1 -> KS2 middle product threshold 963, // KS2 -> KS4 middle product threshold 17119, // KS4 -> FFT middle product threshold 12, // nussbaumer multiplication threshold 12 // nussbaumer squaring threshold }, { // bits = 4 80, // KS1 -> KS2 multiplication threshold 1053, // KS2 -> KS4 multiplication threshold SIZE_MAX, // KS4 -> FFT multiplication threshold 94, // KS1 -> KS2 squaring threshold 1053, // KS2 -> KS4 squaring threshold 28087, // KS4 -> FFT squaring threshold 78, // KS1 -> KS2 middle product threshold 1053, // KS2 -> KS4 middle product threshold 12720, // KS4 -> FFT middle product threshold 12, // nussbaumer multiplication threshold 11 // nussbaumer squaring threshold }, { // bits = 5 86, // KS1 -> KS2 multiplication threshold 1077, // KS2 -> KS4 multiplication threshold 32051, // KS4 -> FFT multiplication threshold 94, // KS1 -> KS2 squaring threshold 1205, // KS2 -> KS4 squaring threshold 18901, // KS4 -> FFT squaring threshold 78, // KS1 -> KS2 middle product threshold 482, // KS2 -> KS4 middle product threshold 14044, // KS4 -> FFT middle product threshold 11, // nussbaumer multiplication threshold 11 // nussbaumer squaring threshold }, { // bits = 6 78, // KS1 -> KS2 multiplication threshold 985, // KS2 -> KS4 multiplication threshold SIZE_MAX, // KS4 -> FFT multiplication threshold 108, // KS1 -> KS2 squaring threshold 985, // KS2 -> KS4 squaring threshold 20868, // KS4 -> FFT squaring threshold 70, // KS1 -> KS2 middle product threshold 674, // KS2 -> KS4 middle product threshold 10785, // KS4 -> FFT middle product threshold 12, // nussbaumer multiplication threshold 11 // nussbaumer squaring threshold }, { // bits = 7 70, // KS1 -> KS2 multiplication threshold 412, // KS2 -> KS4 multiplication threshold 20868, // KS4 -> FFT multiplication threshold 86, // KS1 -> KS2 squaring threshold 493, // KS2 -> KS4 squaring threshold 17119, // KS4 -> FFT squaring threshold 66, // KS1 -> KS2 middle product threshold 282, // KS2 -> KS4 middle product threshold 10785, // KS4 -> FFT middle product threshold 11, // nussbaumer multiplication threshold 11 // nussbaumer squaring threshold }, { // bits = 8 56, // KS1 -> KS2 multiplication threshold 385, // KS2 -> KS4 multiplication threshold 20868, // KS4 -> FFT multiplication threshold 67, // KS1 -> KS2 squaring threshold 482, // KS2 -> KS4 squaring threshold 15505, // KS4 -> FFT squaring threshold 61, // KS1 -> KS2 middle product threshold 288, // KS2 -> KS4 middle product threshold 7753, // KS4 -> FFT middle product threshold 11, // nussbaumer multiplication threshold 11 // nussbaumer squaring threshold }, { // bits = 9 57, // KS1 -> KS2 multiplication threshold 264, // KS2 -> KS4 multiplication threshold 17119, // KS4 -> FFT multiplication threshold 70, // KS1 -> KS2 squaring threshold 264, // KS2 -> KS4 squaring threshold 14044, // KS4 -> FFT squaring threshold 57, // KS1 -> KS2 middle product threshold 247, // KS2 -> KS4 middle product threshold 8560, // KS4 -> FFT middle product threshold 11, // nussbaumer multiplication threshold 11 // nussbaumer squaring threshold }, { // bits = 10 62, // KS1 -> KS2 multiplication threshold 173, // KS2 -> KS4 multiplication threshold 15505, // KS4 -> FFT multiplication threshold 75, // KS1 -> KS2 squaring threshold 241, // KS2 -> KS4 squaring threshold 12307, // KS4 -> FFT squaring threshold 50, // KS1 -> KS2 middle product threshold 173, // KS2 -> KS4 middle product threshold 6154, // KS4 -> FFT middle product threshold 11, // nussbaumer multiplication threshold 11 // nussbaumer squaring threshold }, { // bits = 11 47, // KS1 -> KS2 multiplication threshold 158, // KS2 -> KS4 multiplication threshold 15505, // KS4 -> FFT multiplication threshold 61, // KS1 -> KS2 squaring threshold 216, // KS2 -> KS4 squaring threshold 14044, // KS4 -> FFT squaring threshold 50, // KS1 -> KS2 middle product threshold 161, // KS2 -> KS4 middle product threshold 7022, // KS4 -> FFT middle product threshold 11, // nussbaumer multiplication threshold 11 // nussbaumer squaring threshold }, { // bits = 12 43, // KS1 -> KS2 multiplication threshold 144, // KS2 -> KS4 multiplication threshold 12307, // KS4 -> FFT multiplication threshold 62, // KS1 -> KS2 squaring threshold 189, // KS2 -> KS4 squaring threshold 8560, // KS4 -> FFT squaring threshold 43, // KS1 -> KS2 middle product threshold 151, // KS2 -> KS4 middle product threshold 5393, // KS4 -> FFT middle product threshold 11, // nussbaumer multiplication threshold 11 // nussbaumer squaring threshold }, { // bits = 13 43, // KS1 -> KS2 multiplication threshold 144, // KS2 -> KS4 multiplication threshold 14044, // KS4 -> FFT multiplication threshold 51, // KS1 -> KS2 squaring threshold 189, // KS2 -> KS4 squaring threshold 14044, // KS4 -> FFT squaring threshold 43, // KS1 -> KS2 middle product threshold 141, // KS2 -> KS4 middle product threshold 6154, // KS4 -> FFT middle product threshold 11, // nussbaumer multiplication threshold 10 // nussbaumer squaring threshold }, { // bits = 14 48, // KS1 -> KS2 multiplication threshold 101, // KS2 -> KS4 multiplication threshold 10785, // KS4 -> FFT multiplication threshold 50, // KS1 -> KS2 squaring threshold 158, // KS2 -> KS4 squaring threshold 7753, // KS4 -> FFT squaring threshold 39, // KS1 -> KS2 middle product threshold 101, // KS2 -> KS4 middle product threshold 4726, // KS4 -> FFT middle product threshold 10, // nussbaumer multiplication threshold 10 // nussbaumer squaring threshold }, { // bits = 15 38, // KS1 -> KS2 multiplication threshold 173, // KS2 -> KS4 multiplication threshold 12720, // KS4 -> FFT multiplication threshold 43, // KS1 -> KS2 squaring threshold 189, // KS2 -> KS4 squaring threshold 9451, // KS4 -> FFT squaring threshold 38, // KS1 -> KS2 middle product threshold 132, // KS2 -> KS4 middle product threshold 4280, // KS4 -> FFT middle product threshold 11, // nussbaumer multiplication threshold 10 // nussbaumer squaring threshold }, { // bits = 16 35, // KS1 -> KS2 multiplication threshold 124, // KS2 -> KS4 multiplication threshold 8560, // KS4 -> FFT multiplication threshold 47, // KS1 -> KS2 squaring threshold 158, // KS2 -> KS4 squaring threshold 7022, // KS4 -> FFT squaring threshold 35, // KS1 -> KS2 middle product threshold 116, // KS2 -> KS4 middle product threshold 4726, // KS4 -> FFT middle product threshold 10, // nussbaumer multiplication threshold 10 // nussbaumer squaring threshold }, { // bits = 17 33, // KS1 -> KS2 multiplication threshold 102, // KS2 -> KS4 multiplication threshold 9451, // KS4 -> FFT multiplication threshold 43, // KS1 -> KS2 squaring threshold 132, // KS2 -> KS4 squaring threshold 8560, // KS4 -> FFT squaring threshold 33, // KS1 -> KS2 middle product threshold 116, // KS2 -> KS4 middle product threshold 4280, // KS4 -> FFT middle product threshold 10, // nussbaumer multiplication threshold 10 // nussbaumer squaring threshold }, { // bits = 18 31, // KS1 -> KS2 multiplication threshold 78, // KS2 -> KS4 multiplication threshold 7022, // KS4 -> FFT multiplication threshold 43, // KS1 -> KS2 squaring threshold 112, // KS2 -> KS4 squaring threshold 5393, // KS4 -> FFT squaring threshold 31, // KS1 -> KS2 middle product threshold 94, // KS2 -> KS4 middle product threshold 3877, // KS4 -> FFT middle product threshold 10, // nussbaumer multiplication threshold 10 // nussbaumer squaring threshold }, { // bits = 19 33, // KS1 -> KS2 multiplication threshold 78, // KS2 -> KS4 multiplication threshold 8560, // KS4 -> FFT multiplication threshold 43, // KS1 -> KS2 squaring threshold 108, // KS2 -> KS4 squaring threshold 7022, // KS4 -> FFT squaring threshold 33, // KS1 -> KS2 middle product threshold 86, // KS2 -> KS4 middle product threshold 3511, // KS4 -> FFT middle product threshold 10, // nussbaumer multiplication threshold 10 // nussbaumer squaring threshold }, { // bits = 20 29, // KS1 -> KS2 multiplication threshold 70, // KS2 -> KS4 multiplication threshold 5393, // KS4 -> FFT multiplication threshold 31, // KS1 -> KS2 squaring threshold 94, // KS2 -> KS4 squaring threshold 4726, // KS4 -> FFT squaring threshold 29, // KS1 -> KS2 middle product threshold 78, // KS2 -> KS4 middle product threshold 3511, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 10 // nussbaumer squaring threshold }, { // bits = 21 27, // KS1 -> KS2 multiplication threshold 72, // KS2 -> KS4 multiplication threshold 7022, // KS4 -> FFT multiplication threshold 35, // KS1 -> KS2 squaring threshold 95, // KS2 -> KS4 squaring threshold 6154, // KS4 -> FFT squaring threshold 27, // KS1 -> KS2 middle product threshold 86, // KS2 -> KS4 middle product threshold 3180, // KS4 -> FFT middle product threshold 10, // nussbaumer multiplication threshold 10 // nussbaumer squaring threshold }, { // bits = 22 31, // KS1 -> KS2 multiplication threshold 66, // KS2 -> KS4 multiplication threshold 5217, // KS4 -> FFT multiplication threshold 35, // KS1 -> KS2 squaring threshold 94, // KS2 -> KS4 squaring threshold 4726, // KS4 -> FFT squaring threshold 27, // KS1 -> KS2 middle product threshold 75, // KS2 -> KS4 middle product threshold 3180, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 23 29, // KS1 -> KS2 multiplication threshold 80, // KS2 -> KS4 multiplication threshold 7022, // KS4 -> FFT multiplication threshold 35, // KS1 -> KS2 squaring threshold 94, // KS2 -> KS4 squaring threshold 5760, // KS4 -> FFT squaring threshold 25, // KS1 -> KS2 middle product threshold 80, // KS2 -> KS4 middle product threshold 3077, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 24 29, // KS1 -> KS2 multiplication threshold 66, // KS2 -> KS4 multiplication threshold 5760, // KS4 -> FFT multiplication threshold 35, // KS1 -> KS2 squaring threshold 86, // KS2 -> KS4 squaring threshold 4726, // KS4 -> FFT squaring threshold 29, // KS1 -> KS2 middle product threshold 75, // KS2 -> KS4 middle product threshold 3077, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 25 25, // KS1 -> KS2 multiplication threshold 66, // KS2 -> KS4 multiplication threshold 6154, // KS4 -> FFT multiplication threshold 33, // KS1 -> KS2 squaring threshold 78, // KS2 -> KS4 squaring threshold 4280, // KS4 -> FFT squaring threshold 24, // KS1 -> KS2 middle product threshold 66, // KS2 -> KS4 middle product threshold 2697, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 26 29, // KS1 -> KS2 multiplication threshold 57, // KS2 -> KS4 multiplication threshold 4280, // KS4 -> FFT multiplication threshold 33, // KS1 -> KS2 squaring threshold 75, // KS2 -> KS4 squaring threshold 3877, // KS4 -> FFT squaring threshold 27, // KS1 -> KS2 middle product threshold 66, // KS2 -> KS4 middle product threshold 2697, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 27 25, // KS1 -> KS2 multiplication threshold 54, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 31, // KS1 -> KS2 squaring threshold 75, // KS2 -> KS4 squaring threshold 3077, // KS4 -> FFT squaring threshold 21, // KS1 -> KS2 middle product threshold 61, // KS2 -> KS4 middle product threshold 2140, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 28 27, // KS1 -> KS2 multiplication threshold 43, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 31, // KS1 -> KS2 squaring threshold 70, // KS2 -> KS4 squaring threshold 3077, // KS4 -> FFT squaring threshold 25, // KS1 -> KS2 middle product threshold 57, // KS2 -> KS4 middle product threshold 2140, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 29 25, // KS1 -> KS2 multiplication threshold 51, // KS2 -> KS4 multiplication threshold 4280, // KS4 -> FFT multiplication threshold 29, // KS1 -> KS2 squaring threshold 70, // KS2 -> KS4 squaring threshold 3877, // KS4 -> FFT squaring threshold 21, // KS1 -> KS2 middle product threshold 61, // KS2 -> KS4 middle product threshold 2140, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 30 21, // KS1 -> KS2 multiplication threshold 47, // KS2 -> KS4 multiplication threshold 3877, // KS4 -> FFT multiplication threshold 30, // KS1 -> KS2 squaring threshold 66, // KS2 -> KS4 squaring threshold 3511, // KS4 -> FFT squaring threshold 21, // KS1 -> KS2 middle product threshold 61, // KS2 -> KS4 middle product threshold 2363, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 31 24, // KS1 -> KS2 multiplication threshold 39, // KS2 -> KS4 multiplication threshold 4280, // KS4 -> FFT multiplication threshold 29, // KS1 -> KS2 squaring threshold 61, // KS2 -> KS4 squaring threshold 3511, // KS4 -> FFT squaring threshold 23, // KS1 -> KS2 middle product threshold 51, // KS2 -> KS4 middle product threshold 2140, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 32 24, // KS1 -> KS2 multiplication threshold 39, // KS2 -> KS4 multiplication threshold 3877, // KS4 -> FFT multiplication threshold 27, // KS1 -> KS2 squaring threshold 66, // KS2 -> KS4 squaring threshold 3511, // KS4 -> FFT squaring threshold 21, // KS1 -> KS2 middle product threshold 51, // KS2 -> KS4 middle product threshold 2140, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 33 21, // KS1 -> KS2 multiplication threshold 33, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 27, // KS1 -> KS2 squaring threshold 57, // KS2 -> KS4 squaring threshold 3180, // KS4 -> FFT squaring threshold 19, // KS1 -> KS2 middle product threshold 50, // KS2 -> KS4 middle product threshold 1939, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 34 21, // KS1 -> KS2 multiplication threshold 31, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 27, // KS1 -> KS2 squaring threshold 47, // KS2 -> KS4 squaring threshold 3180, // KS4 -> FFT squaring threshold 21, // KS1 -> KS2 middle product threshold 43, // KS2 -> KS4 middle product threshold 1939, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 35 21, // KS1 -> KS2 multiplication threshold 31, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 27, // KS1 -> KS2 squaring threshold 39, // KS2 -> KS4 squaring threshold 3077, // KS4 -> FFT squaring threshold 19, // KS1 -> KS2 middle product threshold 40, // KS2 -> KS4 middle product threshold 1539, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 36 21, // KS1 -> KS2 multiplication threshold 27, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 25, // KS1 -> KS2 squaring threshold 41, // KS2 -> KS4 squaring threshold 3077, // KS4 -> FFT squaring threshold 21, // KS1 -> KS2 middle product threshold 38, // KS2 -> KS4 middle product threshold 1756, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 37 17, // KS1 -> KS2 multiplication threshold 30, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 24, // KS1 -> KS2 squaring threshold 43, // KS2 -> KS4 squaring threshold 3077, // KS4 -> FFT squaring threshold 17, // KS1 -> KS2 middle product threshold 40, // KS2 -> KS4 middle product threshold 1539, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 38 19, // KS1 -> KS2 multiplication threshold 25, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 27, // KS1 -> KS2 squaring threshold 24, // KS2 -> KS4 squaring threshold 3077, // KS4 -> FFT squaring threshold 19, // KS1 -> KS2 middle product threshold 35, // KS2 -> KS4 middle product threshold 1590, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 39 19, // KS1 -> KS2 multiplication threshold 27, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 23, // KS1 -> KS2 squaring threshold 28, // KS2 -> KS4 squaring threshold 2697, // KS4 -> FFT squaring threshold 17, // KS1 -> KS2 middle product threshold 35, // KS2 -> KS4 middle product threshold 1349, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 40 19, // KS1 -> KS2 multiplication threshold 27, // KS2 -> KS4 multiplication threshold 3511, // KS4 -> FFT multiplication threshold 23, // KS1 -> KS2 squaring threshold 37, // KS2 -> KS4 squaring threshold 3077, // KS4 -> FFT squaring threshold 19, // KS1 -> KS2 middle product threshold 39, // KS2 -> KS4 middle product threshold 1539, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 41 17, // KS1 -> KS2 multiplication threshold 25, // KS2 -> KS4 multiplication threshold 3077, // KS4 -> FFT multiplication threshold 21, // KS1 -> KS2 squaring threshold 36, // KS2 -> KS4 squaring threshold 2363, // KS4 -> FFT squaring threshold 19, // KS1 -> KS2 middle product threshold 33, // KS2 -> KS4 middle product threshold 1349, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 42 17, // KS1 -> KS2 multiplication threshold 24, // KS2 -> KS4 multiplication threshold 3180, // KS4 -> FFT multiplication threshold 21, // KS1 -> KS2 squaring threshold 27, // KS2 -> KS4 squaring threshold 2363, // KS4 -> FFT squaring threshold 17, // KS1 -> KS2 middle product threshold 33, // KS2 -> KS4 middle product threshold 1440, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 43 16, // KS1 -> KS2 multiplication threshold 19, // KS2 -> KS4 multiplication threshold 2697, // KS4 -> FFT multiplication threshold 21, // KS1 -> KS2 squaring threshold 21, // KS2 -> KS4 squaring threshold 2363, // KS4 -> FFT squaring threshold 14, // KS1 -> KS2 middle product threshold 31, // KS2 -> KS4 middle product threshold 1305, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 44 17, // KS1 -> KS2 multiplication threshold 24, // KS2 -> KS4 multiplication threshold 3077, // KS4 -> FFT multiplication threshold 19, // KS1 -> KS2 squaring threshold 36, // KS2 -> KS4 squaring threshold 2363, // KS4 -> FFT squaring threshold 17, // KS1 -> KS2 middle product threshold 29, // KS2 -> KS4 middle product threshold 1349, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, { // bits = 45 16, // KS1 -> KS2 multiplication threshold 21, // KS2 -> KS4 multiplication threshold 2697, // KS4 -> FFT multiplication threshold 19, // KS1 -> KS2 squaring threshold 27, // KS2 -> KS4 squaring threshold 2071, // KS4 -> FFT squaring threshold 14, // KS1 -> KS2 middle product threshold 29, // KS2 -> KS4 middle product threshold 1182, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 46 17, // KS1 -> KS2 multiplication threshold 16, // KS2 -> KS4 multiplication threshold 3077, // KS4 -> FFT multiplication threshold 21, // KS1 -> KS2 squaring threshold 27, // KS2 -> KS4 squaring threshold 2363, // KS4 -> FFT squaring threshold 16, // KS1 -> KS2 middle product threshold 31, // KS2 -> KS4 middle product threshold 1305, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 47 16, // KS1 -> KS2 multiplication threshold 19, // KS2 -> KS4 multiplication threshold 2697, // KS4 -> FFT multiplication threshold 21, // KS1 -> KS2 squaring threshold 25, // KS2 -> KS4 squaring threshold 2071, // KS4 -> FFT squaring threshold 16, // KS1 -> KS2 middle product threshold 25, // KS2 -> KS4 middle product threshold 1305, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 48 17, // KS1 -> KS2 multiplication threshold 14, // KS2 -> KS4 multiplication threshold 2697, // KS4 -> FFT multiplication threshold 19, // KS1 -> KS2 squaring threshold 23, // KS2 -> KS4 squaring threshold 2071, // KS4 -> FFT squaring threshold 16, // KS1 -> KS2 middle product threshold 29, // KS2 -> KS4 middle product threshold 1349, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 49 14, // KS1 -> KS2 multiplication threshold 15, // KS2 -> KS4 multiplication threshold 2697, // KS4 -> FFT multiplication threshold 19, // KS1 -> KS2 squaring threshold 21, // KS2 -> KS4 squaring threshold 2071, // KS4 -> FFT squaring threshold 16, // KS1 -> KS2 middle product threshold 27, // KS2 -> KS4 middle product threshold 1182, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 50 14, // KS1 -> KS2 multiplication threshold 16, // KS2 -> KS4 multiplication threshold 2697, // KS4 -> FFT multiplication threshold 19, // KS1 -> KS2 squaring threshold 33, // KS2 -> KS4 squaring threshold 2071, // KS4 -> FFT squaring threshold 14, // KS1 -> KS2 middle product threshold 27, // KS2 -> KS4 middle product threshold 1305, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 51 14, // KS1 -> KS2 multiplication threshold 17, // KS2 -> KS4 multiplication threshold 2363, // KS4 -> FFT multiplication threshold 19, // KS1 -> KS2 squaring threshold 27, // KS2 -> KS4 squaring threshold 1756, // KS4 -> FFT squaring threshold 14, // KS1 -> KS2 middle product threshold 25, // KS2 -> KS4 middle product threshold 1182, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 52 14, // KS1 -> KS2 multiplication threshold 17, // KS2 -> KS4 multiplication threshold 2363, // KS4 -> FFT multiplication threshold 19, // KS1 -> KS2 squaring threshold 24, // KS2 -> KS4 squaring threshold 1756, // KS4 -> FFT squaring threshold 13, // KS1 -> KS2 middle product threshold 27, // KS2 -> KS4 middle product threshold 1070, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 53 13, // KS1 -> KS2 multiplication threshold 14, // KS2 -> KS4 multiplication threshold 1876, // KS4 -> FFT multiplication threshold 16, // KS1 -> KS2 squaring threshold 19, // KS2 -> KS4 squaring threshold 1590, // KS4 -> FFT squaring threshold 13, // KS1 -> KS2 middle product threshold 25, // KS2 -> KS4 middle product threshold 970, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 54 14, // KS1 -> KS2 multiplication threshold 19, // KS2 -> KS4 multiplication threshold 2363, // KS4 -> FFT multiplication threshold 17, // KS1 -> KS2 squaring threshold 19, // KS2 -> KS4 squaring threshold 1876, // KS4 -> FFT squaring threshold 14, // KS1 -> KS2 middle product threshold 23, // KS2 -> KS4 middle product threshold 1182, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 55 13, // KS1 -> KS2 multiplication threshold 12, // KS2 -> KS4 multiplication threshold 2071, // KS4 -> FFT multiplication threshold 19, // KS1 -> KS2 squaring threshold 13, // KS2 -> KS4 squaring threshold 1756, // KS4 -> FFT squaring threshold 14, // KS1 -> KS2 middle product threshold 23, // KS2 -> KS4 middle product threshold 1182, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 56 13, // KS1 -> KS2 multiplication threshold 10, // KS2 -> KS4 multiplication threshold 2363, // KS4 -> FFT multiplication threshold 17, // KS1 -> KS2 squaring threshold 13, // KS2 -> KS4 squaring threshold 2004, // KS4 -> FFT squaring threshold 13, // KS1 -> KS2 middle product threshold 21, // KS2 -> KS4 middle product threshold 1070, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 57 13, // KS1 -> KS2 multiplication threshold 15, // KS2 -> KS4 multiplication threshold 1876, // KS4 -> FFT multiplication threshold 16, // KS1 -> KS2 squaring threshold 19, // KS2 -> KS4 squaring threshold 1756, // KS4 -> FFT squaring threshold 13, // KS1 -> KS2 middle product threshold 23, // KS2 -> KS4 middle product threshold 970, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 58 13, // KS1 -> KS2 multiplication threshold 10, // KS2 -> KS4 multiplication threshold 1876, // KS4 -> FFT multiplication threshold 16, // KS1 -> KS2 squaring threshold 19, // KS2 -> KS4 squaring threshold 1756, // KS4 -> FFT squaring threshold 13, // KS1 -> KS2 middle product threshold 23, // KS2 -> KS4 middle product threshold 970, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 59 13, // KS1 -> KS2 multiplication threshold 13, // KS2 -> KS4 multiplication threshold 1539, // KS4 -> FFT multiplication threshold 17, // KS1 -> KS2 squaring threshold 17, // KS2 -> KS4 squaring threshold 1305, // KS4 -> FFT squaring threshold 13, // KS1 -> KS2 middle product threshold 21, // KS2 -> KS4 middle product threshold 970, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 60 13, // KS1 -> KS2 multiplication threshold 13, // KS2 -> KS4 multiplication threshold 1539, // KS4 -> FFT multiplication threshold 16, // KS1 -> KS2 squaring threshold 17, // KS2 -> KS4 squaring threshold 1440, // KS4 -> FFT squaring threshold 12, // KS1 -> KS2 middle product threshold 21, // KS2 -> KS4 middle product threshold 878, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 61 13, // KS1 -> KS2 multiplication threshold 12, // KS2 -> KS4 multiplication threshold 1756, // KS4 -> FFT multiplication threshold 14, // KS1 -> KS2 squaring threshold 16, // KS2 -> KS4 squaring threshold 1590, // KS4 -> FFT squaring threshold 12, // KS1 -> KS2 middle product threshold 21, // KS2 -> KS4 middle product threshold 878, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 62 13, // KS1 -> KS2 multiplication threshold 19, // KS2 -> KS4 multiplication threshold 2363, // KS4 -> FFT multiplication threshold 16, // KS1 -> KS2 squaring threshold 31, // KS2 -> KS4 squaring threshold 2363, // KS4 -> FFT squaring threshold 13, // KS1 -> KS2 middle product threshold 24, // KS2 -> KS4 middle product threshold 1182, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 63 13, // KS1 -> KS2 multiplication threshold 19, // KS2 -> KS4 multiplication threshold 2363, // KS4 -> FFT multiplication threshold 16, // KS1 -> KS2 squaring threshold 35, // KS2 -> KS4 squaring threshold 2140, // KS4 -> FFT squaring threshold 13, // KS1 -> KS2 middle product threshold 25, // KS2 -> KS4 middle product threshold 1070, // KS4 -> FFT middle product threshold 8, // nussbaumer multiplication threshold 8 // nussbaumer squaring threshold }, { // bits = 64 13, // KS1 -> KS2 multiplication threshold 23, // KS2 -> KS4 multiplication threshold 4280, // KS4 -> FFT multiplication threshold 16, // KS1 -> KS2 squaring threshold 33, // KS2 -> KS4 squaring threshold 3877, // KS4 -> FFT squaring threshold 12, // KS1 -> KS2 middle product threshold 23, // KS2 -> KS4 middle product threshold 1756, // KS4 -> FFT middle product threshold 9, // nussbaumer multiplication threshold 9 // nussbaumer squaring threshold }, }; // end of file ****************************************************************